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Preface 



This volume contains the invited papers and papers selected for presentation at 
the 25th conference on Theory and Practice of Informatics — SOFSEM ’98, held 
in Jasna, Slovakia, November 21-27, 1998. 

The SOFSEM conference series started in 1974 as a local event in Czecho- 
slovakia and from the very beginning became the top domestic event in software 
theory and practice. It has been unique in several respects, being a mix of win- 
ter school, conference, and advanced workshop. It brought together professionals 
from academia and industry and provided an opportunity for both theoretici- 
ans and practitioners to learn about the new developments in a broad range of 
computer science subjects via a series of invited talks. The conference gradually 
evolved into an international event, keeping most of its original characteristics. It 
features a relatively large number of invited talks, refereed papers (contributed 
papers), and refereed poster contributions. In addition, time and space for flash 
communications, industrial presentations, and exhibitions are provided. 

SOFSEM is every year result of a considerable effort by a number of peo- 
ple. Its Advisory Board (Dines Bjprner, Manfred Broy, Michal Chytil, Peter 
van Emde Boas, Georg Gottlob, Keith G. Jeffrey, Maria Zemankova) and En- 
dowment Board (Keith G. Jeffrey, Jan Pavelka, Frantisek Plasil, Igor Prfvara, 
Branislav Rovan, vice-chair, Jan Staudek, Jin Wiedermann, chair) is in process 
of being transformed into a Steering Gommittee. All members of these commit- 
tees have devoted special attention to the silver jubilee SOFSEM and I am glad 
to acknowledge this. As usual, the core of the work has been done by the Pro- 
gram Gommittee and the Organizing Gommittee listed below. Special thanks go 
also to the referees who helped to evaluate the 48 submissions, from which of 
18 papers have been selected by the Program Gommittee for presentation at the 
conference and for publication in the Proceedings. 

The invited talks are traditionally grouped into a number of tracks chosen by 
the Endowment Board for that particular year. SOFSEM ’98 has a special silver 
jubilee track featuring five talks with a historic aspect in the area of models 
of computation (R. Freivalds), algorithms (G. Ausiello), formal (M. Broy) and 
practical (D. Rombach) aspects of software, and database systems (G. van Emde 
Boas-Lubsen an P. van Emde Boas). In addition, four more tracks of invited talks 
are featured: Parallel and Distributed Gomputing (M. Boasson, U. Kastens, and 
P. Ruzicka), Electronic Gommerce (P. Hanacek, W. Lamersdorf, B. Preneel, 
L.A.M. Strous, and Gh. Vanoirbeek), Electronic Documents and Digital Libra- 
ries (R. Liiling, Gh. Nikolaou, G. Roisin), and Trends in Algorithms (B. Ghor, 
A. Marchetti-Spaccamela, R. Niedermeier, J. Rolim). All but one of these talks 
are documented in this volume (two of them as abstracts only). 

The decision of Springer- Verlag to make available in parallel the electronic 
versions of the volumes published in the LNGS series adds extra duties to the 
volume editors. Perhaps, this is the necessary price for the convenice the com- 
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puter science community is enjoying. I would like to thank all the authors who 
did their best to follow the guidelines. It would be very difficult (if at all possi- 
ble) to produce the volume in time without the technical assistance of Miroslav 
Chladny (Comenius University, Bratislava), who also designed and operated the 
electronic support for the work of the Program Committee. Last but not least, 
I would like to thank Springer- Verlag for the traditionally excellent and smooth 
co-operation in producing this volume. 



September 1998 Branislav Rovan 

SOFSEM ’98 Program Committee Chair 
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Abstract. Due to the many possible interactions with an ever changing environ- 
ment, combined with stringent requirements regarding temporal behaviour, 
robustness, availability, and maintainability, large-scale embedded systems are 
very complex in their design. Coordination models offer the potential of separat- 
ing functional requirements from other aspects of system design. In this paper 
we present a software architecture for large-scale embedded systems that incor- 
porates an explicit coordination model. Conceptually the coordination model 
consists of application processes that interact through a shared data space - no 
direct interaction between processes is possible. Starting from this relatively 
simple model we derive successive refinements of the model to meet the require- 
ments that are typical for large-scale embedded systems. The software architec- 
ture has been applied in the development of commercially available command- 
and-control, and traffic management systems. Experience confirms that due to 
the resulting very high degree of modularity and maximal independence 
between modules, these systems are relatively easy to develop and integrate in 
an incremental way. Moreover, distribution of processes and data, fault-tolerant 
behaviour, graceful degradation, and dynamic reconfiguration are directly sup- 
ported by the architecture. 



1. Introduction 

Due to the many possible interactions with an ever changing environment, com- 
bined with stringent requirements regarding temporal behaviour, robustness, availabil- 
ity, and maintainability, large-scale embedded systems, like traffic management, 
process control, and command-and-control systems, are very complex in their design. 
The tasks performed by these systems typically include: (1) processing of measure- 
ments obtained from the environment through sensing devices, (2) determination of 
model parameters describing the environment, (3) tracking discrepancies between 
desired state and perceived state, (4) taking corrective action, and (5) informing the 
operator, or team of operators, about the current and predicted state of affairs. All tasks 
are very closely related and intertwined, and particularly in large-scale systems, there 
is a huge number of model parameters, which are often intricately linked through 
numerous dependencies. It is therefore a very natural approach to design the software 
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for such systems as a monolithic entity, in which all relevant information (deductive 
knowledge and actual data) is readily accessible for all the above mentioned parts. 

There is, however, a strong and well-known reason to proceed differently: a soft- 
ware system thus conceived is very difficult to implement, and even more difficult to 
modify should the purpose of the system be changed, or the description of the environ- 
ment be refined. Adopting a modular approach to design, the various functions imple- 
mented in software are separated into different modules that have some independence 
from each other. Such an approach - well established today as standard software engi- 
neering practice - leads to better designs, and reduces development time and the likeli- 
hood of errors. 

Unfortunately, with today’s highly sophisticated systems, this is still not good 
enough. In addition to the functional requirements of these systems, many non-func- 
tional requirements, such as a high degree of availability and robustness, distribution 
of the processing over a possibly wide variety of different host processors, and (on- 
line) adaptability and extendibility, place constraints on the design freedom that can 
hardly be met with current design approaches. A methodology for the design of large- 
scale distributed embedded systems should provide (a basis for) an integral solution 
for the various types of requirements. Traditional design methods based on functional 
decomposition are not adequate. The sound principle of modularity needs therefore to 
be further exploited to cover non-functional requirements as well. 

Recently, coordination models and languages have become an active area of 
research [10]. In [11] it was argued that a complete programming model consists of 
two separate components: the computation model and the coordination model. The 
computation model is used to express the basic tasks to be performed by a system, i.e. 
the system’s functionality. The coordination model is applied to organize the functions 
into a coherent ensemble; it provides the means to create processes, and facilitates 
communication. One of the greater merits of separating computation from coordina- 
tion is the considerably improved modularity of a system. The computation model 
facilitates a traditional functional decomposition of the system, while the coordination 
model accomplishes a further decoupling between the functional modules in both 
space and time. This is exemplified by the relative success of coordination languages 
in the field of distributed and parallel systems. 

Since the early 80’s we have developed and refined a software architecture for 
large-scale distributed embedded systems [3], that is based on a separation between 
computation and coordination. Below, we first present the basic software architecture, 
after which we shall focus on the underlying coordination model. We demonstrate how 
the basic coordination model can be gradually refined to include non-functional 
aspects, such as distributed processing and fault-tolerance, in a modular fashion. Next 
we indicate how formal techniques can be introduced for reasoning about systems built 
according to this architecture. A short example illustrates various aspects of system 
design. We conclude with a discussion of our experiences in the design of commer- 
cially available command-and-control, and traffic management systems. 

2. Software Architecture 

A software architecture defines the organisational principle of a system in terms of 
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types of components and possible interconnections between these components. In 
addition, an architecture prescribes a set of design rules and constraints governing the 
behaviour of components and their interaction [5]. Traditionally, software architectures 
have been primarily concerned with structural organisation and static interfaces. With 
the growing interest in coordination models, however, more emphasis is placed on the 
organizational aspects of behaviour and interaction. 

In practice, many different software architectures are in use. Some well-known 
examples are the Client/Server and Blackboard architectures. Clearly, these architec- 
tures are based on different types of components - clients and servers versus knowl- 
edge sources and blackboards - and use different styles of interaction - requests from 
clients to servers versus writing and reading on a common blackboard. 

The software architecture, named SPLICE, that we developed for distributed 
embedded systems basically consists of two types of components: applications and a 
shared data space. Applications are active, concurrently executing processes that each 
implement part of the system’s overall functionality. Besides process creation, there is 
no direct interaction between applications; all communication takes place through a 
logically shared data space simply by reading and writing data elements. In this sense 
SPLICE bears strong resemblance to coordination languages and models like Linda 
[7], Gamma [1], and Swarm [18], where active entities are coordinated by means of a 
shared data space. 

2.1. The Shared Data Space 

The shared data space in SPLICE is organized after the well-known relational data 
model. Each data element in the shared data space is associated with a unique sort, that 
defines its structure. A sort definition declares the name of the sort and the record fields 
the sort consists of. Each record field has a type, such as integer, real, or string; various 
type constructors, such as enumerated types, arrays, and nested records, are provided 
to build more complex types. 

Sorts enable applications to distinguish between different kinds of information. A 
further differentiation between data elements of the same sort is made by introducing 
identities. As is standard in the relational data model, one or more record fields can be 
declared as key fields. Each data element in the shared data space is uniquely deter- 
mined by its sort and the value of its key fields. In this way applications can unambig- 
uously refer to specific data elements, and relationships between data elements can be 
explicitly represented by referring from one data element to the key fields of another. 

To illustrate, we consider a simplified example taken from the domain of air traffic 
control. Typically a system in this domain would be concerned with various aspects 
about flights, such as flight plans and the progress of flights as tracked from the reports 
that are received from the system’s surveillance radar. Hence, we define sorts flightp- 
lan, report, and track as indicated in figure 1. 

Sort fiightplan declares four fields: a flight number, e.g. KL332 or AE1257, the 
scheduled time for departure and arrival, and the type of aircraft that carries out the 
flight, e.g. a Boeing 737 or an Airbus A320. By declaring the flight number as a key 
field, it is assumed that each flight plan is uniquely determined by its flight number. 

Sort report contains the measurement vector of an object as returned at a specific 
time by the system’s surveillance radar. The measurement vector typically contains 
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sort flightplan 

key flightnumber : string 
departure : time 
arrival : time 
aircraft : string 

sort report 

key index : integer 
measurement : vector 
timestamp : time 

sort track 

key flightnumber : string 
timestamp : time 
state : vector 

Figure 1: Sort definitions - an example 

position information. A unique index is attached to be able to distinguish between dif- 
ferent reports. 

Through a correlation and identification process, the progress of individual flights is 
recorded in sort track. The state vector typically contains position and velocity infor- 
mation on the associated flight number, that is computed from consecutive measure- 
ments. The timestamp identifies the time at which the state vector has been last 
updated. 

2.2. Applications 

Basically, applications interact with the shared data space by writing and reading 
data elements. SPLICE does not provide an operation for globally deleting elements 
from the shared data space. Instead, data can be removed implicitly using an overwrit- 
ing mechanism. This mechanism is typically used to update old data with more recent 
values as the system’s environment evolves over time. Additionally, applications can 
hide data, once read, from their view. This operation enables applications to progres- 
sively traverse the shared dataspace by successive read operations. By the absence of a 
global delete operation, the shared dataspace in SPLICE models a dynamically chang- 
ing information store, where data can only be read or written. This contrasts the view 
where data elements represent shared resources, that can be physically consumed by 
applications. 

SPLICE extends an existing (sequential) programming language with coordination 
primitives for creating processes and for interacting with the shared dataspace. More 
formally, the primitives are defined as follows. 

• create(/): creates a new application process from the executable file named/, and run 
it in parallel to the existing applications. 

• write(a, x): inserts an element x of sort a into the shared data space. If an element 
of sort a with the same key value as x already exists in the shared dataspace, then the 
existing element is replaced by x. 
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• read(a, q, t): reads an element of sort a from the shared dataspace, satisfying query 
q. The query is formulated as a predicate over the record fields of sort a. In case a 
matching element does not exist, the operation blocks until either one becomes 
available or until the timeout t has expired. If the latter occurs, a timeout error is 
returned by the operation. The timeout is an optional argument: if absent the read 
operation simply blocks until a matching element becomes available. In case more 
than one matching element can be found, one is selected non-determinstically. 

• get(a, q, t)\ operates identically to the read operation, except that the element 
returned from the shared dataspace becomes hidden from the application’s view, that 
is, the same element cannot be read a second time by the application. 

The overwriting mechanism that is used when inserting data elements into the 
shared dataspace potentially gives rise to conflicts. If at the same time two different 
applications each write a data element of the same sort and with the same key value, 
one element will overwrite the other in a nondeterministic order. Consequently one of 
the two updates will be lost. In SPLICE this type of nondeterministic behaviour is con- 
sidered undesirable. The architecture therefore imposes the design constraint that for 
each sort at most one application shall write data elements with the same key value. 

As an illustration we return to the air traffic control example from the previous sec- 
tion. Consider an application process that tracks the progress of flight number n. This 
application continuously reads new reports from the surveillance radar and updates the 
track data of flight number n accordingly. The application process can be defined as 
indicated by the code fragment in figure 2. 

t := gttitrack, flightnumber = n); 
repeat 

r := get{report, true); 

if correlates{r, t) then 
update{t, r); 
write(track, t); 

end if 

until terminated{t)\ 

Figure 2: Coordination primitives - an example. 

The application first reads the initial track data for flight number n from the shared 
dataspace. The initial data is produced by a separate application that is responsible for 
track initiation. The application then enters a loop where it first reads a new report r 
from the shared dataspace. If the report correlates with the current track f, as expressed 
by the condition correlates{r, f), then track t is updated by the newly received report, 
using the procedure update{t, r). The updated track is inserted into the shared 
dataspace, replacing the previous track data of flight number n. This process is 
repeated until track t is terminated. Termination can be decided, for instance, if a track 
did not receive an update over a certain period of time. 

3. Refinements of the Architecture 



The shared dataspace architecture is based on an ideal situation where many non- 
functional requirements, such as distribution of data and processing across a computer 
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network, fault-tolerance, and system response times, need not be taken into account. 
We next discuss how, through a successive series of modular refinements, a software 
architecture can be derived that fully supports the development of large-scale, distrib- 
uted embedded systems. 

3.1. A Distributed Software Architecture 

The first aspect that we consider here is distribution of the shared data space over a 
network of computer systems. The basic architecture is refined by infroducing fwo 
additional components. As illustrated in figure 3, the additional components consist of 
agents and a communication network. 

Each application process interacts with exactly one agent. An agent embodies a 
local database for storing data elements, and processing facilities for handling all com- 
munication needs of the application processes. All agents are identical and need no 
prior information about either the application processes or their communication 
requirements. Communication between agents is established by a message passing 
mechanism. Messages between agents are handled by the communication network that 
interconnects them. The network must support broadcasting, but should preferably 
also support direct addressing of agents, and multicasting. An application process 
interacts with its assigned agent by means of the interaction primitives from section 

2.2. The interaction with agents is transparent with respect to the shared dataspace 
model: application processes continue to operate on a logically shared dataspace. 

The agents are passive servers of the application processes, but are actively involved 
in establishing and maintaining the required inter-agent communication. The commu- 
nication needs are derived dynamically by the collection of agents from the read and 
write operations that are issued by the application processes. The protocol that is used 
by the agents to manage communication is based on a subscription paradigm that can 
be briefly ouflined as follows. 

Application processes 




Figure 3: A distributed software architecture 

First consider an application that performs a write operation. The data element is 
transferred to the application’s agent, which initially stores the element into its local 
database, overwriting any existing element of the same sort and with the same key 
value. 

Next consider an application that issues a read request for a given sort. Upon receipt 
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of this request, the application’s agent first checks whether this is the first request for 
that particular sort. If it is, the agent broadcasts the name of the sort on the network. 

All other agents, after receiving this message, register the agent that performed the 
broadcast as a subscriber to the sort carried by the message. Next, each agent verifies if 
its local database contains any data elements of the requested sort, previously written 
by its application process, in which case copies of these elements are transferred to the 
newly subscribed agent. After this initial transfer, any subsequently written data of the 
requested sort will be immediately forwarded to all subscribed agents. 

Each subscribed agent stores both the initially and all subsequently transferred cop- 
ies into its local database, overwriting any existing data of the same sort and with the 
same key value. During all transfers a protocol is used that preserves the order in 
which data elements of the same sort have been written by an application. This mecha- 
nism in combination with the architecture’s design constraint that for each sort at most 
one application writes data elements with the same key value, guarantees that over- 
writes occur in the same order with all agents. Otherwise, communication by the 
agents is performed asynchronously. 

The search for data elements matching the query of a read request is performed 
locally by each agent. If no matching element can be found, the operation is suspended 
either until new data of the requested sort arrives or until the specified timeout has 
expired. 

Execution of a get operation is handled by the agents similarly to the read operation, 
except that the returned data element is removed from the agent’s local database. 

As a result of this protocol, the shared dataspace is selectively replicated across the 
agents in the network. The local database of each agent contains data of only those 
sorts that are actually read or written by the application it serves. In practice the 
approach is viable, particularly for large-scale distributed systems, since the applica- 
tions are generally interested in only a fraction of all sorts. Moreover, the communica- 
tion pattern in which agents exchange data is relatively static: it may change when the 
operational mode of a system changes, or in a number of circumstances in which the 
configuration of the system changes (such as extensions or failure recovery). Such 
changes to the pattern are rare with respect to the number of actual communications 
using an established pattern. It is therefore beneficial from a performance point of view 
to maintain a subscription registration. After an initial short phase each time a new sort 
has been introduced, the agents will have adapted to the new communication require- 
ment. This knowledge is subsequently used by the agents to distribute newly produced 
data to all the agents that hold a subscription. Since subscription registration is main- 
tained dynamically by the agents, all changes to the system configuration will automat- 
ically lead to adaptation of the communication patterns. 

Note that there is no need to group the distribution of a data element to the collec- 
tion of subscribed agents into an atomic transaction. This enables a very efficient 
implementation in which the produced data is distributed asynchronously and the 
latency between actual production and use of the data depends largely on the consum- 
ing application processes. This results in upper bounds that are acceptable for distrib- 
uted embedded systems where timing requirements are of the order of milliseconds. 
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3.2. Temporal Aspects 

The shared dataspace as introduced in section 2.1, models a persistent store: data 
once written remains available to all applications until it is either overwritten by a new 
instance or hidden from an application’s view by execution of a get operation. The per- 
sistence of data decouples applications in time. Data can be read, for instance, by an 
application that did not exist the moment the data was written, and conversely, the 
application that originally wrote the data might no longer be present when the data is 
actually read. 

Applications in the embedded systems domain deal mostly with data instances that 
represent continuous quantities: data is either an observation sampled from the sys- 
tem’s environment, or derived from such samples through a process of data association 
and correlation. The data itself is relatively simple in structure; there are only a few 
data types, and given the volatile nature of the samples, only recent values are of inter- 
est. However, samples may enter the system at very short intervals, so sufficient 
throughput and low latency are crucial properties. In addition, but to a lesser extent, 
embedded systems maintain discrete information, which is either directly related to 
external events or derived through qualitative reasoning from the sampled input. 

This observation leads us to refine the shared dataspace to support volatile as well 
as persistent data. The sort definition, whose basic format was introduced in section 
2.1, is extended with an additional attribute that indicates whether the instances of a 
sort are volatile or persistent. For persistent data the semantics of the read and write 
operations remain unchanged. Volatile data, on the other hand, will only be visible to 
the collection of applications that is present at the moment the data is written. Any 
application that is created afterwards, will not be able to read this data. 

Returning to the air traffic control example from figure 1, the sort report can be 
classified as volatile, whereas the sorts track and flightplan are persistent. Conse- 
quently, the tracking process, as specified in figure 2, does not receive any reports from 
the surveillance radar that were generated prior to its creation. After the tracking proc- 
ess has been created, it first gets the initial track data and then waits until the next 
report becomes available. 

Since the initial track data is produced exactly once, the tracking process must be 
guaranteed to have access to it, otherwise the process might block indefinitely. This 
implies that the sort track must be persistent. 

The subscription-based protocol, that manages the distribution of data in a network 
of computer systems, can be refined to exploit the distinction between volatile and per- 
sistent data. Since volatile data is only available to the applications that are present at 
the moment the data is written, no history needs to be kept. Consequently, if an appli- 
cation writes a data element, it is immediately forwarded to the subscribed agents, 
without storing a copy in the application’s local database. This optimization reduces 
the amount of storage that is required. Moreover, it eliminates the initial transfer of any 
previously written data elements, when an application performs the first read operation 
on a sort. This enables a newly created application to integrate into the communication 
pattern without initial delay, which better suits the timing characteristics that are typi- 
cally associated with the processing of volatile data. 
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3.3. Fault-tolerance 

Due to the stringent requirements on availability and safety that are typical of large- 
scale embedded systems, there is the need for redundancy in order to mask hardware 
failures during operation. Fault-tolerance in general is a very complex requirement to 
meet and can, of course, only be partially solved in software. In SPLICE, the agents 
can be refined to provide a mechanism for fault-tolerant behaviour. The mechanism is 
based on both data and process replication. By making fault-tolerance a property of the 
software architecture, the design complexity of applications can be significantly 
reduced. 

In this paper we only consider failing processing units, and we assume that if a 
processor fails, it stops executing. In particular, we assume that communication never 
fails indefinitely and that data does not get corrupted. 

If a processing unit in the network fails, the data that is stored in this unit, will be 
permanently lost. The solution is to store copies of each data element across different 
units of failure. The subscription-based protocol described in section 3.1 already 
implements a replicated storage scheme, where copies of each data element are stored 
with the producer and each of the consumers. The basic protocol, however, is not suffi- 
cient to implement fault-tolerant data storage in general. For instance, if data elements 
of a specific sort have been written but not (yet) read, the elements are stored with the 
producer only. A similar problem occurs if the producers and consumers of a sort hap- 
pen to be located on the same processing unit. 

The solution is to store a copy of each data element in at least one other unit of fail- 
ure. The architecture as depicted in figure 3 is extended with an additional type of 
component: a persistent database. This component executes a specialized version of 
the subscription protocol. On start-up a persistent database broadcasts the name of 
each persistent sort on the network. As a result of the subscription protocol that is exe- 
cuted by the collection of agents, any data element of a persistent sort that is written by 
an application, will be automatically forwarded to the persistent database. There can be 
one or more instances of the persistent database executing on different processing 
units, dependent on the required level of system availability. Moreover, it is possible to 
load two or more persistent databases with disjoint sets of sort names, leading to a dis- 
tributed storage of persistent data. 

When a processing unit fails, also the applications that are executed by this unit will 
be lost. The architecture can be refined to support both passive and active replication of 
applications across different processing units in the network. 

Using passive replication, only one process is actually executing, while one or more 
back-ups are kept off-line, either in main memory or on secondary storage. When the 
processing unit executing the active process fails, one of the back-ups is activated. In 
order to be able to restore the internal state of the failed process, it is required that each 
passively replicated application writes a copy of its state to the shared dataspace each 
time the state is updated. The internal state can be represented by one or more persist- 
ent sorts. When a back-up is activated, it will first restore the current state from the 
shared dataspace and then continue execution. 

When timing is critical, active replication of processes is often a more viable solu- 
tion. In that case, multiple instances of the same application are executing in parallel. 
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hosted by different processing units; all instances read and write data. Typically, active 
replication is used when interruption of services cannot be tolerated. 

The subscription-based protocol can be refined to support active replication trans- 
parently. If a particular instance of a replicated application performs a write operation, 
its agent attaches a unique replication index as a key field to the data element. The 
index allows the subscribed agents to distinguish between the various copies that they 
receive from a replicated application. Upon a read request, an agent first attempts to 
return a matching element having a fixed default index. When, after some appropriate 
time-out has expired, the requested element is still not available, a matching element 
with an index other than the default is returned. From that moment on it is assumed 
that the application corresponding to the default index has failed, and the subscription 
registration is updated accordingly. The index of the actually returned data element 
now becomes the new default. 

A general overview of the distributed software architecture supporting fault-toler- 
ance based on the various data and process replication techniques is given in figure 4. 

3.4. System Modifications and Extensions 

In the embedded systems domain requirements on availability often make it neces- 
sary to support modifications and extensions while the current system remains on-line. 
There are two distinct cases to be considered. 

• The upgrade is an extension to the system, introducing new applications and sorts 

but without further modifications to the existing system. 

• The upgrade includes modification of existing applications. 

Since the subscription registration is maintained dynamically by the agents, it is 
obvious that the current protocol can deal with the first case without further refine- 
ments. After installing and starting a new application, it will automatically integrate. 

The second case, clearly, is more difficult. One special, but important, category of 
modifications can be handled by a simple refinement of the agents. Consider the prob- 
lem of upgrading a system by replacing an existing application process with one that 
implements the same function, but using a better algorithm, leading to higher quality 
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results. In many systems it is not possible to physically replace the current application 
with the new one, since this would require the system to be taken off-line. 

By a refinement of the agents it is possible to support on-line replacement of appli- 
cations. If an application performs a write operation, its agent attaches an additional 
key field to the data element representing the application’s version number. Upon a 
read request, an agent now first checks whether multiple versions of the requested 
instance are available in the local database. If this is the case, the instance having the 
highest version number is delivered to the application - assuming that higher numbers 
correspond to later releases. From that moment on, all data elements with lower ver- 
sion numbers, received from the same agent, are discarded. In this way an application 
can he dynamically upgraded, simply by starting the new version of the application, 
after which it will automatically integrate and replace the current version. 

4. Formalisation 

In this section we will hriefly sketch ongoing work in developing a theoretical 
framework for SPLICE and SPLICE-based systems. 

Process algebras provide a well-known formal method for reasoning about distrib- 
uted systems [17][12]. Processes in SPLICE are sequential processes, that interact with 
each other by means of read and write actions on the shared dataspace. It therefore 
seems natural to study SPLICE in the context of Communicating Sequential Processes 
(CSP [12][I3]). In CSP-like algebras, the semantics of processes is given in terms of 
traces: the behaviour of a process is represented by sequences of its communication 
actions. Unlike CSP, communication actions in SPLICE are not synchronized. There- 
fore, we use a process algebra for Data-Flow Networks (Dfn), an asynchronous vari- 
ant of CSP, as basis for a SPLICE Process Algebra (Spa). In Dfn, output is never 
blocked: the environment is always ready to accept data. Dfn is subsumed in Receptive 
Process Theory, RPT [15]. A general explanation of Dfn in the context of CSP can be 
found in [13]. The development of Spa is inspired by [16] where another sub-theory of 
RPT is presented which is based on delay-insensitive communications. 

In Spa we model SPLICE processes at a certain level of abstraction, where we 
focus on the communication behaviour of processes. Processes in Spa are typically 
denoted by P, Q and R\ different sorts are typically denoted by a, b, and c. Processes 
have an input and an output alphabet, denoted with i(P) and o(P) respectively. These 
alphabets define the sorts that can be read or written by P according to the subscription 
paradigm of SPLICE. We abstract from the agents; their role as manager of communi- 
cation is integrated in the semantics of the operators in Spa. The local data space is 
modelled with an after operator. Communication between different processes is 
expressed by parallel composition. The idea is that data a.v (of sort a and value v) writ- 
ten by P is read by Q (given that a e 0{P) and a e i(2) ; i.e. P is a publisher of a 
and Q is subscribed to a). We do not make assumptions about Q actually reading a.v, 
we only assume that Q eventually will have a.v in its local data space. In general, there 
may be multiple subscribers of sort a, but only one publisher. 

In Spa, the coordination primitives of SPLICE are written as: 

• a!v: Write an element of sort a and value v. We demand that a e 0{P) if P is 
the process publishing a.v. Writing is never blocked by the environment. 
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• a?x\ Read an element of sort a (nondestructive read). We demand that 
a e {(P) if P is the process that reads an element of sort a. The read operation 
is blocking when no x is available. 

• a? X: Get (read and remove) an element x of sort a (destructive read). We demand 
that a e {(P) , with P the process performing this operation. The get operation 
is blocking when no element of sort a is available. 

• P/a.v: P after reception of data a.v {a e {(P) ). In terms of SPLICE, a.v has 
been stored in the local data space of P. 

• P\\ Q: Parallel composition of P and Q. 

Just as in Dfn, we have to impose some syntactic restrictions on processes in Spa. It 
is required that i(P) and o(P) are finite, that the output alphabet is non-empty, and that 
the input and output alphabets are disjoint: 

o(P)^0 i(P)no(P)=0 

The host language is replaced by an abstract programming language based on 
guarded commands [9]. This language has operators for nondeterministic choice, pre- 
fixing, guarded-choice, and recursion. 

Using the Spa operators presented above we can express several coordination prim- 
itives, such as asynchronously writing, (non)destructively reading, dynamic process 
creation, process replication, et cetera. Via Spa we can use Dfn to represent the 
SPLICE primitives. The laws of the algebra define valid program transformations for 
SPLICE processes. Details are beyond the scope of this paper, however, and the reader 
is referred to [8]. 

The goal of the formalization is to support the design of SPLICE-based systems 
within a process algebra. Design rules and guidelines can be formally derived and used 
to improve the development process. Design rules put (extra) constraints on the com- 
position of program constructs; guidelines streamline the task of developing programs 
that conform to the design rules. The benefit of restricting the design freedom through 
design rules, is the guarantee of certain emergent system properties. Therefore, the 
next step is to focus on the definition of design rules and related guidelines. 

Another step is to extend the read and get operations with the query mechanism of 
SPLICE; that is, an element is read when it is available and when the given query 
yields for this element; otherwise, the operation blocks. With such a mechanism we 
can distinguish the order in which data is read. Because reordering read operations in 
the presence of queries might introduce deadlock, we will need to impose restrictions 
on the use of the query mechanism. Complete understanding of these restrictions is 
only possible through the use of mathematical techniques. 

Besides the coordination primitives, there are other important features of SPLICE 
to be expressed. We mention real-time and fault-tolerant behaviour. Currently, we can 
express data and process replication in Spa; future work will focus on design rules for 
fault-tolerant behaviour and on the introduction of temporal properties in our formal 
framework. 

5. Example 

To demonstrate the benefits of SPLICE in designing complex systems, a full-scale 
example should be presented. This is clearly beyond the scope of this paper, and we 
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therefore restrict ourselves to an academic problem, with nevertheless interesting prop- 
erties. 

Imagine a rectangular grid of railway tracks, intersecting at regular intervals, such 
that trains cannot change from one track to another. Now suppose there is exactly one 
train on each of the n east-west tracks, and on each of the n north-south tracks. Each 
train has a given amount of time it should spend going from one end of its track to the 
other. Trains only move either from west to east, or from north to south. When reach- 
ing the end of a track, a train jumps back to the beginning and repeats its journey with 
a new deadline for reaching the other end. The journey times are random perturbations 
over a preset time given for each train separately. Trains are required to: 

* travel safely, 

* meet their deadlines, to the extent possible, and 

* as much as possible, provide passengers with a comfortable journey, 
with decreasing relative importance in the order given. 

Now suppose that a train cannot be trusted to carrying out a given schedule; it will 
therefore be impossible to compute for each train what the best schedule for a given 
journey would be. (Note that the random variations on the travel times from start to hn- 
ish already make computing a globally optimal schedule for all trains impractical.) It 
will thus be necessary to control the trains dynamically: depending on the actual situa- 
tion, trains need to adjust their speeds to fulfil the requirements. 

There are several possible solutions to this problem. At a high level of abstraction, 
there is the distinction between central and distributed control. Central control, 
although intuitively attractive for its (perceived) conceptual simplicity has the draw- 
back of poor scalability [6]. In the case of distributed control, there are different ways 
of organizing the control function: it may either be associated with the railway tracks, 
or with the trains themselves. 

The solution presented here is based on the following principle: 

Every train periodically makes available its status. The status contains the following 
information: 

time, -- time at which status was produced 

position, -- position of head of train at "time" 

nextX, -- next crossing 

onX, -- either 0 when not on a crossing, 

-- or crossing number of occupied crossing 
length, -- length of the train 

speed, -- speed at time "time" 

acc, -- acceleration at time "time" 

delay, -- delay w.r.t. schedule at time "time" 

action taken, -- action relating to collision avoidance 
maximum speed, 
maximum acceleration, 
maximum deceleration 

Each train in addition obtains status information for those trains that it might con- 
ceivably collide with. All trains then independently, and without further communica- 
tion, determine what their course of action must be for satisfying the requirements. In 
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order to do so, all trains obey the same set of traffic rules. Evaluating the rules every 
time new information is obtained then allows the trains to autonomously (i.e. without 
interaction with other trains) decide what action is appropriate. 

The following set of rules has been implemented (the rules marked * are necessary 
for safety and cannot be removed; the others are optional): 





condition 


action 


* 


conflicting decisions 


east-west train has priority 


* 


train already on Xing 


keep going 


* 


other train on Xing 


wait for Xing to be clear again 


* 


other train has taken action 


take complementary action 




different delays 


most delayed has priority 




different speeds 


faster train has right of way 




different lengths 


shortest train goes first 


* 


otherwise 


east-west has priority 



5.1. Discussion 

There are two aspects that merit discussion. One is the validity of this solution, the 
other is its implementation. 

The argument that trains will meet the requirements consists of several parts. One 
approach would be to first show that the set of behavioural rules is sufficient to prevent 
accidents, under the assumption that trains detect an imminent collision in time to 
brake. Second, conditions would then have be specified that guarantee the validity of 
that assumption. And finally it must be shown that if there is no collision danger (any 
more), a train will focus on meeting its deadline 

This paper’s topic being distributed reactive systems, we will limit the discussion to 
the implementation of the solution using SPLICE. 

5.1.1. Structure of the solution 

The essence of the solution is that each train has information about the behaviour of 
all other trains, or at least the relevant subset of them (trains on parallel tracks will 
never collide, and thus there is no need for information about such trains). SPLICE 
provides an ideal mechanism for organizing this information exchange. As explained 
above, the model underlying SPLICE is a shared dataspace, upon which individual 
application programs may perform actions, like reading and writing of data elements. 

The obvious approach, is therefore to model each of the trains as a separate pro- 
gram, where programs only differ in their parametrization to indicate track and static 
train properties. Each program consists of an endless cycle, in which it repeatedly 
requests data from SPLICE, then determines whether action is necessary, and subse- 
quently writes its current state to SPLICE. This cycle is repeated at the required sam- 
ple rate, which in the case of our experiments was 5 times per second. Note that in 
general there is a relationship between sample rate, maximum allowable speed, and 
network geometry. 
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5.1.2. Scalability issues 

For scalability, and in order to improve efficiency, it is desirable to limit the amount 
of data to be communicated to a minimum. Clearly, with a track-grid of size n, there 
2 

are n communications necessary for all the trains to obtain information about all the 
others travelling in a different direction. Thus, minimizing the amount of information 
to be exchanged can make a huge difference in network load. 

In addition, since a train cannot collide with another train on a track that has been 
crossed already, it is not necessary for a train to obtain status data from trains on those 
tracks. So, as a train moves from the beginning of its track to the end, the number of 
potentially dangerous trains reduces monotonously, and consequently, the train pro- 
gram’s processing requirements should diminish accordingly. Since in our simulation, 
trains are randomly distributed over the network as a result of the random perturbations 
in their schedules, the average processing load should benefit from exploiting this 
observation. Note that in a real setting the expected processing load for a train would 
probably be very low, and the only necessary optimization would be in reducing com- 
munication requirements. 

5.1.3. Implementation 

Forgetting about a possible optimization in which the sample rate for a particular 
train is made dependent on its speed, both optimizations can be very easily expressed 
using SPLICE. To achieve a minimal communication load, the first improvement is to 
split the state data into two separate structures: one that describes (pseudo-) static 
properties of a train and therefore only changes rarely, if ever, and another that 
describes the actual position, speed and other variable attributes of a train. Simple as 
this may appear, this approach introduces the difficulty of ensuring that the properties 
and dynamic data for a particular train are always processed together. Clearly, if the 
properties data never change, it suffices to read them once only, during a program ini- 
tialization phase. If however, due to unforeseen circumstances a train’s abilities may 
suddenly change, this data will have to be read at unpredictable moments. The shared 
dataspace provides a solution that is both elegant and effective. SPLICE supports sub- 
scriptions to so-called multi-sorts: collections of individual sorts that must be proc- 
essed together, and that are related through a common key (as illustrated in figure 1). 
Depending on the needs of the application program, data can be retrieved from 
SPLICE in different ways; one of them will return fresh data for all those sorts that 
have received an update since the last read operation, and stale (previously read) data 
when no new data has been received. SPLICE ensures that in either case the data of the 
different sorts making up the multi-sort, form a coherent set, as expressed through the 
common key. Thus, rather than producing information in complete records, the data is 
assembled at the consuming side, for which the application only needs to provide a 
specification. This mechanism rather strongly contrasts with traditional message-pass- 
ing and client-server systems, such as e.g. object oriented designs, where the burden is 
upon the producer of information. 

Preventing state data from trains on parallel tracks to be needlessly communicated 
is slightly more difficult. The first step is for a train program to express interest in 
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orthogonal trains only. This is easily accomplished in SPLICE through the built-in fil- 
ter-mechanism, which permits a content-dependent refinement of the subscription to 
be specified. Thus, provided fhe track-direction of a particular train is encoded in the 
state data, it is possible to express that only data is required where the direction is dif- 
ferent from one’s own. SPLICE will evaluate this filter expression for all incoming 
data elements (of the relevant sort), and will discard those that fail the test. Depending 
on the structure of the application, SPLICE may decide to evaluate the filter expression 
at the producer’s site, rather than at the consumer’s, for better utilizing the available 
communication bandwidth. Note that this involves migrating the filter from the con- 
sumer location (where it is defined), to the producer’s site. 

Both techniques can be illustrated with the following example, taken verbatim from 
the trains program: 

char dir_f liter [64 ] ; 
char *df = dir_filter; 

A filter expression is a string expressing a condition, that is compiled into byte (or 
machine) code by SPLICE upon its definition. 

struct sp_consumer_part train_state [] = { 

{sp_use_sort (train_descr) , "@id" , " " / 0,NULL, 1, &df } , 
{sp_use_sort (state_descr) , "@id" , " " / 0,NULL, 1, &df } } ; 

This declaration describes a multi-sort, consisting of two constituent sorts: 
“train_descr” and “state_descr”, as adumbrated above, 
sprintf (dir_f liter, ".Id %c 0", 

myself . track_dlr == dlr_hor ? '<' : 

This statement generates the actual filter expression: “id” is the common key in both 
sorts, and encodes the track the train is using. Negative track numbers are N-S tracks, 
positive track numbers are W-E tracks. Since there is only one train per track, this pro- 
vides unique identification of a train. Eor ease of programming, the static data 
“myself’ in addition contains an enumeration type giving the track direction: 
“track_dir”. Thus, depending on the train’s direction, SPLICE will check for positive 
or negative “id” values in received data, and discard the unwanted instances of parallel 
trains. 

ts_cons=sp_start_consumlng_multl (app, 2, traln_state, 

1 , &w , NULL , 

"splice" , " " , " " ,NULL, SP_SYNC_DEFAULT) ; 

This statement, finally, starts the multi-sort subscription. 

In order to prevent processing of data on tracks already behind a given train, another 
mechanism is used. Rather than refining inferesf in subscribed data through filter 
expressions, we use the query facility to exclude this data from being returned to the 
application. 

q=sp_def lne_query (ts_cons , "@track==%p matching @tall<=%f", 
&p_track, Xlngs [myself . track] ) ; 

This way, the data will always be available in the local instance of the shared 
dataspace, which in a subsequent cycle of the train over its track will be needed for 
processing anyway. 

The choice between the filter and the query mechanism is partly a matter of taste. 
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and partly one of efficiency. Filter expressions cannot contain references to program 
variables and it is relatively expensive to install a filter, whereas query expressions may 
link data fields to actual program states. Since a new filter would have to be installed 
after every intersection, it was decided to use the query mechanism, even though com- 
munication overhead cannot be reduced in this way. 

5.1.4. Simulation results 

The program has been implemented in C and was executed on UltraSparc 
machines, running at 300 MHz, among others. The largest size gird that has been 
experimented with, was lOOx 100 tracks, resulting in 200 train processes running con- 
currently. It was found that a single machine could not fulfil fhe processing require- 
ments and consequently executed the train programs at a lower periodicity. Instead of 
the requested 5 Hz, programs cycled at approximately 3 Hz. This still presents an aver- 
age number of SPLICE dataspace updates of 30000 per second, in addition to an equal 
number of read operations and 600 executions per second of the Finite State Automa- 
ton that implements the decision procedure for a train. This resilience against overload 
is typical of systems designed for a shared-data architecture, and strongly contrasts 
with other design approaches. 

Running the same experiment on two machines, connected by standard (10 Mbit/ 
sec) Ethernet, proved near the limit of the communication channel, but easy on the 
processors, that achieved 50000 dataspace updates per second. The limitation caused 
by the channel is largely due to the fact that all write operations on the shared 
dataspace result in separate packets; since there is little data in a packet, this is far from 
optimal. Since the program requires all communication to be reliable, frequent colli- 
sions can have a snowballing effect: SPLICE guarantees the order of messages in reli- 
able communication, and missing messages will therefore be retransmitted. Once the 
effective capacity of the network is reached, this can easily lead to an explosion of col- 
lisions. An improvement can be obtained by collecting a number of data elements in a 
single packet: this will reduce the number of Ethernet packets, at the cost of somewhat 
greater latency. SPLICE provides mechanisms that allow this packaging to occur auto- 
matically, without having to compromise the solution (thus, maintaining the independ- 
ence between individual train programs in particular). 

6. Conclusion 

Due to the inherent complexity of the environment in which large-scale embedded 
systems operate, combined with the stringent requirements regarding temporal behav- 
iour, availability, robustness, and maintainability, the design of these systems is an 
intricate task. Coordination models offer the potential of separating functional require- 
ments from other aspects of system design. We have presented a software architecture 
for large-scale embedded systems that incorporates a separate coordination model. We 
have demonstrated how, starting from a relatively simple model based on a shared data 
space, the model can be successively refined to meet the requirements that are typical 
for this class of systems. Finally, We have indicated how formal techniques can be 
used to support development of SPLICE-based systems. 

Over the past years SPLICE has been applied in the development of commercially 
available command-and-control, and traffic management systems. These systems typi- 
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cally consist of some 1000 applications running on close to 100 processors intercon- 
nected by a hybrid communication network. Experience with the development of these 
systems confirms that the software architecture, including all of the refinements dis- 
cussed, significantly reduces the complexity of the design process [4]. (In [2] the 
authors argue that SPLICE lacks global control mechanisms, and that consequently 
understanding and debugging systems tend to be difficult. Unfortunately, the argu- 
ments given are either inapplicable or wrong; in addition, our experience does strongly 
contradict their statement.) Due to the high level of decoupling between processes, 
these systems are relatively easy to develop and integrate in an incremental way. More- 
over, distribution of processes and data, fault-tolerant behaviour, graceful degradation, 
and dynamic reconfiguration are directly supported by the architecture. 
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Abstract. We introduce a logical and mathematical theory for the speci- 
fication of system components and the typical steps of the development 
process. In particular, we identify three patterns of development 

• refinement within one level of abstraction, 

• transition from one level of abstraction to the other, 

• implementation by glass box refinement. 

We introduce refinement relations to capture these three dimensions of the 
development space. We give verification conditions for these refinement 
steps. In this way, a logical basis for the development of systems is 
described. 



1 Introduction 

For a discipline of modular system development firmly based on a scientific theory we 
need a clear notion of components and ways to manipulate and to compose them. In 
this paper, we introduce a mathematical model of a component with the following 
characteristics: 

• A component is interactive. 

• It is connected with its environments by named and typed channels. 

• It receives input messages from its environment on its input channels and generates 
output messages to its environment on its output channels. 

• A component can be nondeterministic. This means that for a given input history 
there may exist several output histories that the component may produce. 

• The interaction between the component and its environment takes place in a global 
time frame. 

Throughout this paper we work exclusively with discrete time. Discrete time is a 
sufficient model for most of the typical applications for information processing 
systems. For an extension of our model to continuous time see [16]. 

Based on the ideas of an interactive component we can define forms of composition. 
We basically introduce only one form of composition, namely parallel composition 
with feedback. This form of composition allows us to model concurrent execution and 
interaction of components within a network. We briefly show that other forms of 
composition can be introduced as special cases of parallel composition with feedback. 
For the systematic stepwise development of components we introduce the concept 
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of refinement. We study three refinement relations namely property refinement, glass 
box refinement, and interaction refinement. We claim that these notions of refinement 
are all what we need for a systematic top down system development. 

Finally, we outline that our approach is compositional. This means that a refine- 
ment step for a composed system is obtained by refinement steps for its components. 
As a consequence, global reasoning about the system can be structured into local 
reasoning about the components. Compositionality relates to modularity in systems 
engineering. The contribution of this paper is the relational version of the stream 
processing approach as developed at the Technische Universitat Miinchen (under the 
keyword FOCUS, see [11], [12]). Moreover, the paper aims at a brief survey on this 
approach. 

We begin with the informal introduction of the concept of interactive components. 
This concept is based on communication histories called streams that are introduced in 
section 3. Then a mathematical notion of a component is introduced in section 4 and 
illustrated by a number of simple examples. Section 5 treats operators for composing 
components into distributed systems. In section 6 we introduce three notions of 
refinements to develop systems and show the compositionality of these notions. Again 
all described concepts are illustrated by simple examples. 



2 Central Notion: Component 

We introduce the mathematical notion of a component and on this basis a concept of 
component specification. A component specification is given by a description of the 
syntactic interface and a logical formula that relates input and output histories. 

The notion of component is essential in systems engineering and software 
engineering. Especially in software engineering a lot of work is devoted to the concept 
of software architecture and to the idea of componentware. Componentware is a 
catchword in software engineering (see [15]) for a development method where software 
systems are composed from given components such that main parts of the systems do 
not have to be reimplemented every time again but can be obtained by new 
configurations of existing software solutions. A key issue for such an approach are well 
designed interfaces and software architectures. Software architectures mainly can be 
described as distributed systems, composed of components. For this, a clean and clear 
concept of a component is needed. 

In software engineering literature the following informal definition of a component 
is found: 



A component is a physical encapsulation of related services according to a published 

specification. 



According to this definition we work with the idea of a component which encapsulates a 
local state or a distributed architecture. We provide a logical way to write a specification 
of component services. We will relate these notions to glass box views, to the derived 
black box views, and to component specifications. 



3 Streams 

A stream is a finite or infinite sequence of messages or of actions. Streams are used to 
represent communication histories for channels or histories of activities. Let M be a 
given set of messages. A stream over the set M is a finite or an infinite sequence of 
elements from M. We use the following notation: 
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M* denotes the finite sequences over M with the empty sequence <>, 

M°° denotes the infinite sequences over M (that can be represented by mappings 
N\{0} ^M). 

A stream is an element of the set M“ which is defined by 
M“ = M* u M“ 

On streams we specify the prefix ordering for x, y e M“ by the formula 
X [ y = 3 z G M“: x"z = y 

Here x"z denotes the concatenation of the stream x to the stream z. If x is infinite then 
x"z = x. 

Throughout this paper we do not work with the simple concept of a stream as 
introduced so far but find it more appropriate to work with so called timed streams. A 
timed stream represents an infinite history of communications over a channel or an 
infinite history of activities that are carried out in a discrete time frame. The discrete 
time frame represents time as an infinite chain of time intervals of equal length. In each 
time interval a finite number of messages can be communicated or a finite number of 
actions can be executed. Therefore we represent a communication history of a system 
model with such a discrete time frame by an infinite sequence of finite sequences of 
messages or actions. By 

(M*)” 

we denote the set of timed streams. The k-th sequence s.k in a timed stream s g (M*)“ 
represents the sequence of messages exchanged on the channel in the k-th time interval 
or the sequence of actions executed in the k-th time interval. 

In the following, we use streams exclusively to model the communication histories 
of sequential communication media called channels. In general, in a system several 
communication streams occur. Therefore we work with channels to identify the 
individual communication streams. Hence, in our approach, a channel is just an 
identifier in a system that is related to a stream in every execution of the system. 

Throughout this paper we work with some simple notation for streams that are listed 
in the following. We use the following notation for a timed stream x: 

z'x concatenation of a sequence z to a stream x, 

XsLi sequence of the first i sequences in the stream x, 

S©x stream obtained from x by deleting all messages that are not elements of the 

set S, 

X finite or infinite stream that is the result of concatenating all sequences in 

X. 

We may also consider timed streams of states to model the traces of state-based system 
models. In the following, we restrict ourselves to message passing systems, however. 



4 Syntactic and Semantic Interfaces 

In this section we introduce a mathematical notion of components. We work with typed 
channels. 
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4.1 I/O-Functions 



Let a set S of sorts or types be given. By C we denote a set of typed channels. We 
assume that we are given a type assignment for the channels in the set C: 

type: C — » S 

Given a set C of typed channels a channel valuation (let M be the set of all messages, 
by [T] we denote for a type T its set of elements) is an element of the set defined as 
follows: 

C = {x: C ^ (M*)“: V c g C: x.c g ([type(c)]*)“} 

A channel valuation x g C associates a stream of elements of type type(c) with each 
channel c g C. The operators on streams induce operators on valuations by pointwise 
application. 

I 1 

F 



io'i 

Fig. 1. Graphical Representation of a Component F with Input Channels I and Output 

Channels O 

Given a set of typed input channels I and a set of typed output channels O we introduce 
the notion of a syntactic interface of a component: 

(I, O) syntactic interface, 

I set of typed input channels and, 

O set of typed output channels. 

In addition to the syntactic interface we need a concept for describing the behavior of a 
component. A behavior is a relation between the input histories and the output 
histories. 

Input histories are represented by valuations of the input channels and output 
histories are represented by the valuations of output channels. We represent the black 
box behavior of a component, by a set valued function the semantic interface: 

F: I^ p{6) 

Given x g I , by F.x we denote the set of all output histories which a component with 
behavior F may produce on the input x. 

Of course, a set valued function, as well known, is isomorphic to a relation. We 
prefer set-valued functions to emphasize the roles of input and output. We call the 
function F an I/O-function. 
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4.2 Specification of I/O-Functions 

Using logical means, an I/O-function F can be described by a logical formula relating 
the streams on the input channels to the streams on the output channels. Syntactically 
therefore such a formula uses typed channels as identifiers for streams. 

A specification of a component provides the following information; 

• its syntactic interface, describing how the component is connected to its 
environment, 

• its behavior by a specifying formula O relating input and output channel 
valuations. 

This way we obtain a specification technique that gives us a very powerful method to 
describe components. 

Example. As simple but very fundamental examples of components we specify a 
merge component MRG, a transmission component TMC, and a fork component FRK 
as follows: 

MRG 

in x: Tl, y: T2 
out z: T3 
X =T1 © z 
y = T2 © z 



Here let Tl, T2, T3 be types (in our case we can see types simply as sets) where Tl and 
T2 are assumed to be disjoint and T3 is the union of the sets of elements of type Tl and 
T2. We specify the proposition x ~ y for timed streams x and y of arbitrary type T by 
the logical equivalence: 

X ~ y = (V m G T: {m}© x = {m}© y) 

Based on this definition we specify the component TMC. 

TMC 
in z: T3 
out z: T3 
z ~ z' 



Here we use the convention for channel identifiers z that occur both as input and as 
output channels: in the specifying formula we write z' to denote the output channel z. 
The simple specification TMC states that every input message occurs also as output 
message by the component, and vice versa. However, messages may be arbitrarily 
delayed and overtake each other. 

FRK 

in z: T3 
out x: Tl, y: T2 
X =T1 © z 
y = T2 © z 
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Note that the merge component as well as the TMC component as they are specified 
here are fair. Every input is finally processed and reproduced as output. _l 

We use the following notation for a component F to refer to the constituents of its 
syntactic interface: 

In(F) the set of input channels I, 

Out(F) the set of output channels O. 

By the specifying formula of a specification of an I/O-function F we may prove 
properties about the function F. 

4.3 Properties of I/O-Functions 

In the following we introduce some basic properties for I/O-functions. An I/O-function 
F: I ^ p(6) 

is called 

• properly timed, if for all times i g N we have 

XsLi = ZsLi F(x)sLi = F(z)sLi 

• time guarded (or causal), if for all times i g N we have 

XsLi = z-li F(x)sLi-Hl = F(z)sLi-Hl 

• partial, if F(x) = 0 for some x g I and total otherwise. 

• realizable, if there exists a time guarded function f: I ^ O such that 

V X G I : f.x G F.x. 

• fully realizable, if for all x g I : F.x = {f.x: f g [F]} 

Here [F] denotes the set of time guarded functions f: I — ^ O, where f.x g F.x for 
all X. 

• time independent (see [9]), if x = z F.x = F.z 

A specifying formula O for a component with the set of input channels I and the set of 
output channels O represents a predicate 

p: I X 6 ^ IB 

This predicate defines an I/O-function 
F: I ^ p{6) 
by the equation (for x g I ) 

F.x= jy G 0:p(x,y)} 

Given a specification with a specifying formula, we either may prove that the specified 
I/O-function fulfills certain of the properties introduced above. Another option is to add 
certain of these properties as schematic requirements to specifications. 

Adding time-guardedness as a requirement on top of the predicate p leads to the 
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inclusion greatest function F' such that y g F'.x implies p(x, y) and F' is time guarded. 
This way we obtain the following definition for the function F'. Let F' by the inclusion 
largest function such that: 

F'.x = {y G 6 : p(x, y)AVx'Gl,kG N: 

XsLk = x'sLk 3 y' G O: ysLk+1 = y'sLk+1 a x' g F'(y')} 

Time guardedness adds the principle of causality between input and output to a 
specification. 

Example: Transmission Component 

Consider the transmission component TMC of the example above. In this case we have 
p(x, y) = X ~ y. Assuming time guardedness we get the function 

F'.x = {y: X ~ y A V x', k G N : XsLk = x'sLk 3 y': ysLk+1 = y'sLk+1 a x' ~ y'} 

From this we easily prove 

y G F'.x =>VmG T3,kG N:#{m}©XsLk>#{m}©ysLk+l 

This formula is a simple consequence of the fact that for each input history x we can 
find an input history x' such that XsLk = x'sLk and 

x' sL k = x' _l 

Time-guardedness is a very basic notion. It models the asymmetry between input and 
output. For time independent deterministic I/O-functions time guardedness has a strong 
relationship to prefix monotonicity. 

As pointed out above, notions like time independence or time guardedness are 
logical properties that can be either added as properties to specifications explicitly or 
proved for certain specifications. It is easy to show for instance that MRG, 'TMC, and 
FRK are time independent. If we add time guardedness as a requirement then all three 
specified FO-functions are fully realizable. 



4.4 State Transition Specifications 

Often it is more appropriate to describe a component by a state transition system with 
input and output. In such a case we have to describe the data state, the initial state, and 
the state transition relation. 

We describe the data state of a transition system by a set of typed state attributes V 
that can be seen as programming variables. Mathematically, then a data state 

B: V ^ U [type(v)] 
vgV 

is a valuation of the attributes by values of the corresponding type. In addition we use a 
finite set W of control states. A state of the component is a pair (w, rj) consisting of a 
control and a data state. By Z we denote the set of all states. 

A state transition machine is given by an initial state Gq and a state transition 
function 



A: (I X (I ^ M*)) ^ p(L X (O ^ M*)) 

Often it is helpful to describe a state transition machine by a state transition diagram in 
a graphical way. A state machine diagram consists of a number of nodes representing 
control states and a number of transition rules represented by labeled arcs between the 
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control states. This is in particular an alternative to our logical characterization of I/O- 
functions. A state transition machine describes also an I/O-function. We show in the 
following how to associate an I/O-function to a state transition machine. 

The state transition function A as introduced above describes a function 

^( 6 )) 

that associates with every state a e Z an I/O-function B^(a) that describes the behavior 
of the system in this state. B^ provides the black box view onto A. 

For each state a g Z, each input pattern z g (I — ^ M*), and each input channel 
valuation x g I , we specify the black box function B^ for the given state a by the 
equation 

B^(a).(<z>''x) = {<t>''y: 3 a' g Z: (a', t) g A(a, z) a y g B^(a').x) 

This is a recursive definition of B^. In particular, B^ is not uniquely determined by the 
equation, in general. We choose the inclusion greatest solution of the equation for B^. 

State transition machines can als be used as implementation of I/O-functions. In 
particular, we may generate program code for certain classes of state transition machines. 
We come back to this issue under the heading glass box refinement. 



5 Composition Operators 



In this section we introduce a notion of composition for components. We prefer to 
introduce only one very general form of composition and later define a number of other 
composing forms as special cases. 




Fig. 2. Parallel Composition with Feedback 

Given I/O-functions with disjoint sets of input channels (where Oj n O 2 = 0) 

F,:I,^^( 6 ,), F2 '■ 1 2 ^(^2) 

we define the parallel composition with feedback by the I/O-function 

F, ®F2: I ^p(O) 

where the syntactic interface is specified by 

I = (Ii u l2)\(Oi u O 2 ), O = (Oi u 02)\(Ii u I 2 ). 

The resulting function is specified by the following equation (here y g C where C = I, 

u I2 u Oi u O2): 

(F, ® F 2 ).x = {y|0: y|I = x|I a y|0, g F,(y|I,) a y|02 g F 2 (y|l 2 ) } 

Here y denotes a valuation of all the channels of Fj and F 2 . By y|C we denote the 
restriction of the valuation y to the channels in C. 
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LetO] and O 2 be the specifying formulas for the functions Fi and Fj respectively; 
we obtain the specifying formula of F, (H) Fj simply hy 

3 Z(, Z|^; O, A O 2 

where {Zj, Zj^} = (Ij n O 2 ) U (I 2 rt Oi) are the internal channels of the system. 

For this form of composition we can prove the following facts by rather 
straightforward proofs: 

(1) if the Fj are time guarded for i = 1, 2, so is F, ® F 2 , 

(2) if the Fj are realizable for i = 1,2, so is Fj <E> F 2 , 

(3) if the Fi are. fully realizable for i = 1, 2, so is Fj <E> F 2 , 

(4) if the Fj are time independent for i = 1,2, so is F, <E> F 2 . 

If the Fi are total and properly timed for i = 1, 2, we cannot conclude, however, that the 
function F, (H) F 2 is total. This shows that the composition works only in a modular 
way for well-chosen subclasses of specifications. 

Some further forms of composition that can be defined by <E> are listed in the 
following (we do not give formal definitions for them, since these are quite 
straightforward) : 

• renaming of channels: F[c/c'] 

• feedback without hiding: p F 

let F: I — ^ p(0), then we define: p F: J — ^ ^(O) where J = I\0 by the 
equation (here we assume y e C where C = I U O): 

(pF).x={y|0: y|I = x|I a y|0 e F(y|I)} 

• sequential composition Fi ; F 2 

Sequential composition of the components F, and F 2 requires Oj = Out(F,) = In(F 2 ) 
= 12 - 

In the special case where Oj = I 2 = (Oi U O 2 ) Pi (Ii U I 2 ) we can reduce sequential 
composition to parallel composition with feedback as follows: 

F, ; F 2 = F, ® F 2 

A simple example of sequential composition (where O, = I 2 ) is the composed 
component MRG;FRK as well as FRK;MRG. 

Example: Feedback for the Component TMC 

In this example we study the result of a composing form like feedback in its dependency 
on the additional requirement of time guardedness. If we do not require time guardedness 
then the specification p TMC hoils down to the specifying formula 

z ~ z 

which is equivalent to true. Assuming time guardedness we get in addition the 
requirement 

VkG N: ZsLk ~ ZsLk-rl 

from which we can conclude by the fact that ztO = <> by induction that z = <>. _l 
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This example demonstrates that time guardedness is an essential assumption to make 
feedback into an operator that mirrors the causality of the operational data flow 
principle. 



6 Refinement for System Development 

Refinement relations (see [13]) are the key to formalize development steps (see [8]) and 
the development process as it is advocat^ in software engineering process models. We 
work with the following basic ideas of refinement relations: 

• property refinement - enhancing requirements - allows us to add properties to a 
specification, 

• glass box refinement - designing implementations - allows us to decompose a 
component into a distributed system or to give a state transition description for a 
component specification, 

• interaction refinement - relating levels of abstraction - allows us to change the 
granularity of the interaction, the number and types of the channels of a component 
(see [10]). 

We claim that these notions of refinement are sufficient to describe all the steps needed 
in the idealistic view of a strict hierarchical top down system development. The three 
refinement concepts mentioned above are explained in detail in the following. 



6.1 Property Refinement 

Property refinement allows us to replace an I/O-function by one with additional 
properties. A behavior 

F: I^^(6) 

is refined by a behavior 

F: ^(O) 

if 

F c F 

This relation stands for the proposition 
V XG I: F(x)cF(x). 

Obviously, property refinement is a partial order. In particular, the refinement relation of 
property refinement is transitive, which garantees that iterated steps of property 
refinement can be composed into one step of property refinement. 

A property refinement is a basic refinement step as it is needed in requirements 
engineering. In the process of requirement engineering, typically the overall services of a 
system are specified. This, in general, is done by requiring more and more sophisticated 
properties for components until a desired behavior is specified. 

Example. A specification of a component that transmits its input on its two input 
channels to its two output channels (but does not necessarily observe the order) is 
specified as follows. 
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TM2 

in x: Tl, y: T2 
out x: Tl, y: T2 

V m G Tl: {m}© x' = {m}© x 

V m G T2: {m}© y' = {m}© y 

We want to relate this specification to the simple specification of the time independent 
identity TII that reads as follows: 

TE 

in x:Tl,y:T2 
out x: Tl, y: T2 
x' = X A y' = y 



Given these two specifications we immediately obtain that TII is a property refinement 
of TM2. 



TII c TM2 

A proof of this relation is straightforward (see below). _l 

The verification conditions for property refinement are easily obtained as follows. For 
given specifications S, and S 2 with specifying formulas Oj and Oj, the specifications S 2 
is a property refinement of Si if the syntactic interfaces of Si and S 2 coincide and if for 
the specifying formulas Oi and O 2 we have 

Oi <= O2 

In our example the verification condition is easily obtained and reads as follows: 

(V m G Tl: {m}©x' = {m}©x) <= x’ = x 
A (V m G T2: {m}© y ' = {m}© y) <= y ' = y 
The proof of this condition is obvious. 

Property refinement can also be used to relate composed components to given 
components (see also glass box refinement in section 6.3). For instance, we obtain the 
following refinement relation 

(MRG ; FRK) c TII 

Again the proof is quite straightforward. 

As we have shown the additional assumption of schematic properties to 
specifications such as time guardedness, time independence or realizability is a 
strengthening of the specifying predicate. Therefore it is a step in the property 
refinement relation. 

Property refinement is characteristic for the development steps in requirements 
engineering. It is also used in the design process where decisions are taken that 
introduce further properties for the components. 



6.2 Compositionality of Property Refinement 

In our case, the proof of the compositionality of property refinement is simple. This is 
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a straightforward consequence of the simple definition of composition. The rule of 
compositional property refinement reads as follows: 

Fi_gFi' F2CF2 

F, ® F2 c Fi ® F2 

The proof of the soundness of this rule is straightforward hy the monotonicity of the 
operator (H) with respect to set inclusion. 

Example. For our example the application of the rule of compositionality reads as 
follows. Suppose we use a specific component MRGl for merging two streams. It is 
defined by 

MRGl 

in x: Tl, y: T2 
out z: T3 
z = <<>>"f(x, y) 

where 

f(<s>"x, <t>'y) = <s"t>"f(x, y) 



Note that this merge component MRGl is both deterministic and time dependent. 
According to our rule of compositionality and transitivity of refinement, it is sufficient 
to prove 



MRGl c MRG 

to conclude 

MRG1;FRK c MRG;FRK 
and by the transitivity of the refinement relation 
MRG1;FRK c TII 

This shows how local refinement steps and their proofs are schematically extended to 
global proofs. _l 

The usage of the composition operator and the relation of property refinement leads to a 
design calculus for requirements engineering. It includes steps of decomposition and 
implementation that are treated more systematically in the following section on glass 
box refinement. 



6.3 Glass Box Refinement 

Glass Box Refinement is the classical concept of refinement that we need and use in the 
design phase. In glass box refinement we replace a system description by a more detailed 
one adding implementation information. 

In the design phase we typically decompose a system with a specified black box 
behavior into a distributed system architecture or we represent (implement) its behavior 
by a state transition machine. In other words, a glass box refinement of a component F 
is a special case of a property refinement where the reined system descriptions are of the 
form 
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F[ <H) F 2 ® ... ® F„ c F design of an architecture 

or of the form 



B^(ao) c F implementation by a state machine 

where the FO-function B^(ao) is defined by a state machine A (see [19] and section 4.4) 
and Oflis its initial state. 

Accordingly, a glass box refinement is a special case of property refinement where 
the refining component has a special syntactic form. In the case of a glass box 
refinement that transforms a component into a network, this form is a term composed of 
a number of components. 

Example. A very simple instance of such a glass box refinement is already shown by 
the proposition 

MRG ; FRK c TII 

It allows us to replace the component TII by two components. □ 

Hence, a glass box refinement works with the relation of property refinement and special 
terms representing the refining component. Thus the construction of implementations 
and their correctness proof can be carried out fully within the framework of refimement. 
The compositionality of glass box refinement is a straightforward consequence of the 
compositionality of property refinement. 



6 . 4 Interaction Refinement 

Interaction refinement is the refinement notion that we need for modeling development 
steps between levels of abstraction. Interaction refinement allows us to change 

• the number and names of input and output channels, 

• the granularity of the messages on the channels 
of a component. 

An interaction refinement requires two functions 
A; C ' — ^ ^ ( C ) R: C — ^ ^ ( C ') 

that relate the abstract with the concrete level of a development step leading from one 
level of abstraction to the other. 

Given an abstract history x g C each history y g R(x) denotes a concrete history 
representing x om the concrete level. Calculating a representation for a given abstract 
history and then its abstraction yields the old abstract history again. This is expressed by 
the following requirement: 

R ; A = Id 



Let Id denote the identity relation. A is called the abstraction and R is called the 
representation. R and A are called a refinement pair. 

For nontimed components it is sufficient to require for the time independent identity 
TII (as a generalization of the specification TII given in section 6. 1 to arbitrary channel 
sets) 



R ; A c TII 



Choosing the component MRG for R and FRK for A immediately gives a refinement 
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pair for nontimed components. Fig. 3 illustrates how the refinement pair relates the 
abstract and the concrete levels. 




abstract level 



concrete level 



Fig. 3. Communication History Refinement 

Interaction refinement allows us to refine components, given appropriate refinement 
pairs for the input and output channels. The idea of an interaction refinement is 
visualized in Fig. 4. 




ab stract level 



concrae level 



Fig. 4. Interface Interaction Refinement (U- simulation) 

Given interaction refinements 

Ap 1 2 ^ 1 1) Rji 1 1 — ^ 1 2 ) 

Aq: 62 ^ ^(6|) Rq: 6, ^ ^(62) 

for the input and output channels we call the I/O-function 
F: 1 2 ^ ^( 62 ) 
an interaction refinement of 
F: 1 1 ^ ^( 61 ) 

if one of the following propositions holds: 

F c A] ; F ; Rq U ‘ -simulation 

These are different versions of useful relations between levels of abstractions. A more 
detailed discussion is found in [13]. 

Example. We obtain 
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TMC c FRK; TII ; MRG 

as a simple example of interaction refinement by U '-simulation. The proof is again 
straightforward. _l 

Interaction refinement is used heavily in many practical system developments, although 
not introduced formally, there. It supports the definition of a formal relation between 
layers of abstraction. This way it can be used to relate the layers of protocol hierarchies, 
the change of data representations for the messages or the states or the introduction of 
time into system developments. Interaction refinement is a Galois connection. 

In our model, in particular, input and output histories are represented explicitly. 
This allows us to apply classical ideas (see [17], [18]) of data refinement to 
communication histories. Roughly speaking: communication histories are nothing than 
data structures that can be manipulated and refined like other data structures. 



7. Conclusions 

What we have presented in the previous chapters is a comprehensive method for a 
system and software development which supports all the steps of a hierarchical stepwise 
refinement development method. It is compositional and therefore supports all the 
modularity requirements that are generally needed. 

The presented method provides, in particular, the following ingredients: 

• a mathematical notion of a syntactic and semantic interface of a component, 

• a formal specification notation and method, 

• a precise notion of composition, 

• a mathematical notion of refinement and development, 

• a compositional development method, 

• a flexible concept of software architecture, 

• concepts of time and the refinement of time (see [16]). 

What we did not mention throughout the paper are concepts that are also available and 
helpful from a more practical point of view including 

• systematic combination with tables and diagrams, 

• tool support in the form of AutoFocus (see [4]). 

The simplicity of our results is a direct consequence of the specific choice of our 
semantic model. The introduction of time makes the model robust and expressive. The 
fact that communication histories are explicitly included allows us to avoid all kinds of 
complications like prophecies or stuttering and leads to an abstract relational view of 
systems. 

Of course, what we have presented is just the scientific kernel of the method. More 
pragmatic ways to describe specifications are needed. These more pragmatic 
specifications can be found in the work done in the SysLab-Project (see [7]) at the 
Technical University of Munich. For extensive explanations of the use of state 
transition diagrams, data flow diagrams and message sequence charts as well as several 
versions of data structure diagrams we refer to this work. 
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Abstract. Constructing phylogenetic (or evolutionary) trees from bio- 
logical data is a classical problem in biology, and it still is a major chal- 
lenge today. Most realistic formulations of the problem, which take errors 
into account, give rise to hard computational problems. In this survey 
paper we concentrate on quartet based tree reconstruction methods. We 
briefly describe the general tree reconstruction problem, and discuss the 
motivation for using quartet based reconstruction. We then turn to the 
computational complexity of this reconstruction task. Finally, we give a 
high level description of some algorithms and heuristics for constructing 
trees from quartets. 



1 Introduction 

Given a set of taxa (a group of related biological species), the goal of phylogeny 
reconstruction is to build a tree which best represents the course of evolution 
for this set over time. The leaves of the tree are labeled with the given, extant 
taxa. Internal nodes correspond to hypothesized, extinct taxa. Because events of 
taxon divergence are assumed to be rare, the sought after tree is bifurcating (or 
binary), with internal nodes of degree 3. (In case of ambiguous data one might 
have to resort to multifurcating trees, which are less informative.) In early days, 
morphologic features were mostly used to study evolution. Today, molecular data 
are the primary basis for phylogenetic analysis of evolution, but other sources 
of information (for example paleontology, anatomy, and morphology) are also in 
use. For simplicity, most of our exposition will concentrate on molecular sequence 
data. 

The first step in constructing a tree is to collect from an updated database 
either DNA (typically genes), RNA, or amino acid sequences for all taxa under 
study. (For the sake of simplicity, we will restrict ourselves to proteins.) Homo- 
logous sequences (detected by similarities, or low edit distances) from different 
taxa are then grouped together. Homologous sequences for different taxa often 
have the same functionality {e.g. insulin, hemoglobin, etc.) and are assumed to 
be descendents of a common ancestral sequence. Their degree of similarity gives 
an indication of the time when two taxa diverged. Since the mutational process 
is assumed to be probabilistic in nature and to operate locally, we expect that 

* Partially supported by the Fund for Promotion of Research at the Technion. 
bennyScs . technion .ac.il. 
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longer periods of time since divergence imply more accumulated mutations. Ho- 
wever, different proteins may evolve at different rates. Combining this with the 
stochastic nature of the process, it is clear that single proteins, viewed separa- 
tely, may give conflicting indications as to the history of evolution. To overcome 
this “noise effect” it is thus advisable to employ longer sequences, obtained by 
concatenating many homologous sequences together. 

In general, phylogeny reconstruction methods are divided into character- 
based and distance-based methods. Character based methods work directly on 
character data that represent various biological features. These methods try to 
produce a tree which minimizes the total number of changes along tree edges. 
Distance based methods start by computing “evolutionary distances” between 
pairs of taxa. Then a tree with weighted edges whose pairwise tree distances 
approximate the evolutionary distances is sought. 



1.1 Organization 

In Section 0 we briefly describe character based and distance based reconstruc- 
tion methods. With long enough input sequences, these methods will succeed 
in correctly reconstruction the tree, with high probability. But “long enough” 
might be way longer than currently available sequences. What compounds this 
problem is the fact that molecular data are not available evenly for all taxa of 
interest. In Section0we explain this data disparity problem, and its implications 
on the above mentioned reconstruction methods. This motivates the introduction 
of quartet based phylogenetic reconstruction . In Sectional we specify the quar- 
tet reconstruction problem, and quote known results on the its computational 
complexity. In Section 0 we describe some algorithmic approaches and heuristics 
to solving the problem. Finally, Section 0 describes some computational results 
and presents a few open problems. 



2 Character and Distance Based Reconstruction 

2.1 Character-Based Methods 

A character-based method considers qualitative characters of the input taxa. 
Any such character is a partition of the input set according to the value each 
taxon takes. Each equivalence class defined thus is called a character state. For 
example, a DNA sequence is composed of the 4 nucleotides characters A, C, T, 
G. An RNA sequence is composed of the 4 nucleotides characters A, C, U, G. 
Finally, a protein sequence is composed of the 20 amino acid characters A , C , D , 
E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W and Y. Sequences origi- 
nating from different species are aligned via a multiple sequence alignment pro- 
cess. Where needed, gaps are inserted into the sequences so as to maximize the 
resemblance of the sequences to each other when laid out one on top of the other. 
Thus, each position of an aligned amino acid sequence (called site) is a character 
with twenty one states (20 amino acids and the gap symbol -). Table 0 exhibits 
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Table 1. Multiple sequence alignment of the sequences for the protein insulin-like 
growth factor II, corresponding to four species: (1) Human (PRI), (2) Mouse (MUR), 
(3) Sheep (RUM), and (4) Chicken (OUT). An asterisk indicates complete agreement 
at a given site, while a dot indicates significant biochemical similarity. Three letter 
initials stand for the taxon to which each specie belongs 

(1) MGIPMGKSMLVLLTFLAFASCCIAAYRPSETLCGGELVDTLQFVCGDRGFYFSRPASRVS 

(2) MGIPVGKSMLVLLISLAFALCCIAAYGPGETLCGGELVDTLQFVCSDRGFYFSRPSSRAN 

(3) MGITAGKSMLALLAFLAFASCCYAAYRPSETLCGGELVDTLQFVCGDRGFYFSRPSSRIN 

(4) MC-AARQILLLLLAFLAYALDSAAAYGTAETLCGGELVDTLQFVCGDRGFYFSRPVGRNN 

( 1 ) RRS-RGI VEECCFRSCDLALLETYCATPAKSERDVSTP PTVLPDNFPRYPVGKFF 

(2) RRS-RGIVEECCFRSCDLALLETYCATPAKSERDVSTS qAVLPDDFPRYPVGKFF 

(3) RRS-RGIVEECCFRSCDLALLETYCAAPAKSERDVSAS TTVLPDDFTAYPVGKFF 

(4) RRINRGIVEECCFRSCDLALLETYCAKSVKSERDLSATSLAGLPALNKESFQKPSHAKYS 

(1) QYDTWK-qSTQRLRRGLPALLRARRGHVLAKELEAFREA-KRHRPLIALPTqDPA-HGGA 

(2) qYDTWR-qSAGRLRRGLPALLRARRGRMLAKELKEFREA-KRHRPLIVLPPKDPA-HGGA 

(3) qSDTWK-qSiqRLRRGLPAFLRARRGRTLAKELEALREA-KSHRPLIALPTqDPATHGGA 

(4) KYNVWqKKSSqRLqREVPGILRARRYRWqAEGLqAAEEARAMHRPLISLPSqRPP-APRA 

(1) PPEMASNRK 

(2) SSEMSSNHq 

(3) SSEASSD— 

(4) SPEATGPqE 

* . 



the result of running the multiple sequence alignment program CLUSTAL W 
on four amino acid sequences. The output of the multiple sequence alignment is 
an I S' I X \C\ matrix, where S is the taxa set and C is the character set. (In the 
example of Table El |5|=4 and \C\ = 189.) Each entry denotes a state which a 
particular taxon exhibits for a given character. Such matrix serves as the input 
for a character based method. Given the aligned sequences (with inserted gaps) 
for each taxon, a natural approach is to build a tree with the internal nodes, as 
well as the external ones, labeled by sequences. The labels at the leaves are given 
as input, while the tree topology and the internal labels are computed by the 
algorithm. The goal is to minimize the number of “mutations” along adjacent 
tree edges that are required in order to explain the data at the leaves. This leads 
to a minimization problem, where the objective function is the sum of Hamming 
distances between neighboring sequences along tree edges. More refined appro- 
aches give different prices to different changes. This “price” measures the log 
likelihood of a local change, as not all point mutations are equally likely m 
The sum of costs of aligned locations determines a cost function for any pair of 
sequences. The global optimization criterion for maximum parsimony is mini- 
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mizing the sum of costs between neighboring sequences. The general maximum 
parsimony problem is NP-hard caEoiEa, but finding the internal labeling for 
a given topology can be done efficiently iMEni. This leads to algorithms which 
scan many different topologies and output the most parsimonious one. 

Another approach tries to construct a tree such that for each character, 
the node-sets that correspond to any character state form a connected sub- 
graph. Maximizing the number of characters for which this is true is called 
the maximum compatibility problem. This problem, too, is NP-hard P] ES|. 
A phylogeny satisfying both maximum parsimony and maximum compatibility 
is called perfect phylogeny. Deciding if the data supports a perfect phylogeny 
was shown to be NP-hard (by an equivalence to the triangulating colored graphs 
problem [Z], and by a reduction from the betweenness problem pm). Although 
hard in the general case, polynomial-time algorithms for perfect phylogeny exist 
in cases where some of the input parameters (the number of states or the number 
of characters) are fixed 0 121 1^ • 

2.2 Distance-Based Methods 

A distance based phylogeny method would typically start by performing all 
pairwise alignments of the input sequences. For each alignment the edit distance 
between the two sequences is computed. This gives rise to a symmetric zero- 
diagonal IIS’! X IIS’! distance matrix. The goal is to produce a phylogeny whose 
induced metric represents the input data in the best way possible. If the input 
distance matrix M is realizable by a tree and its induced path lengths, then M is 
said to be additive. The special case where all leaves have the same distance from 
the root is called an ultrametric. Ultrametric trees correspond to the biological 
theory that substitutional events in different species occur at the same rate (this 
is the “molecular clock” assumption, nowadays popularly discredited). 

Given an additive metric, constructing the tree is easy pm. The problem is 
that real-life input is erroneous. Some of the errors are inherent in the assumed 
model of evolution. Therefore, we seek phytogenies with induced metric which 
approximates the “best possible” tree metric under some criterion. Again, most 
of the problems in this domain are NP-hard. 

One popular heuristic approach is agglomerative clustering. It works iterati- 
vely, and at each iteration two nodes are joined into one parent node. Examples 
include the popular Neighbor Joining method m, the Fitch-Margoliash method 
pnij . and BIONJ pi] . 

As it turns out, approximating ultrametrics under the Loo criterion is doable 
in polynomial time El Emma. Consequently, there are algorithms that use 
an approximated ultrametric to produce an approximation of the additive metric 

Di] 



3 Motivation for Quartet Based Reconstruction 

We demonstrate the data disparity problem by considering the example of mam- 
mals. The specie most extensively studied is humans. Next in popularity come 
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Table 2. An illustration of the data disparity problem 
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- 
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- 
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-f 


-f 
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-f 


-f 


-f 


+ 


-f 


-1 



model species like rats and mice, and certain species of economic/ agriculture 
importance, like cows and sheep. On the other hand, neither government nor in- 
dustry is going to finance an armadillo genome project in the foreseeable future. 
Thus we are doomed to stay with significant disparities in available sequence 
data between different taxa for many years to come. 

To illustrate the situation. Table El depicts a portion of the available data 
matrix for 6 taxa (5 mammalian taxa and one non-mammalian “outgroup”) 
vs. 15 protein sequences. The rows correspond to specific proteins, while the 
columns correspond to specific taxa. A “-I-” in the (f,j) entry indicates that 
i-th protein sequence is known for the j-th taxon, while a indicates that 
it is unknown. The three letter initials stand for the following taxa: PRI = 
Primates (humans, apes and monkeys), MUR = Muridae (rats, mice, hamsters, 
voles), RUM = Ruminantia (cows and sheep, but not pigs or horses), LAG = 
Lagomorpha (rabbits and hares), INS = Insectivora (hedgehogs and shrews), 
OUT = Outgroup0 (chicken). In this small table, all 15 protein sequences are 
known for PRI, 10 are known for LAG, and just 2 are known for INS. In actuality, 
there are even more entries than what is implied by Table 0 . Looking at 
actual numbers 1321 as recently extracted from the HOVERGEN database H3| 
with respect to 23 mammalian taxa and one outgroup taxon, the number of 
sequenced proteins vary from 621 for Primates down to 3 for Ghrysochloridae 
(golden moles). This gives a mean of 135 and a median of 19 proteins with 
known sequence per taxa. The number of taxa for which a specific protein was 
sequenced varies in a similar fashion. Most proteins, totaling 363, are sequenced 

^ The outgroup is an auxiliary input which assists in determining the root of the 
phylogenetic tree. See Section 0 for more details. 



From Quartets to Phylogenetic Trees 



41 



for only 4 taxa, while only 3 are sequenced for as many as 20 taxa (out of 24 
taxa). 

Given the data disparity problem, molecular phylogeneticists must frequently 
decide on a trade off between the number of taxa and the amount of molecular 
data used in a study. If we restrict ourselves only to sequences that are common 
to all the taxa under study, we may end up ignoring the vast majority of data. 
The tree constructed this way will be strongly biased towards the evolution of 
the few over represented proteins. If, on the other hand, we insist on taxa for 
which a large number of protein sequences are known, we end up with a small 
number of taxa (this is referred to as “taxonomic sampling”). Both character 
based methods and distance based methods have as their starting point a list of 
sequences that are known for all taxa under study. Therefore, both methods are 
effected by the data disparity problem. This implies that in order to utilize all 
available data, a different approach is called for. 

One way to try and circumvent the data disparity problem is to make, for 
each taxa, a long sequence which is the concatenation of individual sequences. 
Where sequence information is not known, we concatenate the appropriate num- 
ber of “missing data” symbols. These long sequences could then serve as a basis 
for either character based or distance based methods. The problem with this 
approach is that most of these long concatenated sequences will consist mainly 
of “missing data” symbols. As a result, the quality of the resulting trees tends 
to be poor. 

An approach that tries to utilize all available data while avoiding the problem 
of taxonomic sampling is the four taxon approach suggested in 1211 and used in 
iSEni- The key idea is to consider small subsets of taxa (say of size i), one 
at a time. For each such subset, take all proteins that are known for all taxa in 
the subset. Using this collection of common sequences, apply either maximum 
parsimony or distance based methods to infer the phylogeny of each subset. Since 
small phylogenies are easier to infer than large ones, this step is computationally 
feasible. The advantage of this approach is that each protein that is common to 
at least i taxa will influence the tree construction. Such protein need not be 
common to all n taxa. This means that many more sequences will be utilized, 
and fewer will be “wasted” . 

What kind of information can one expect to get? Different subsets will usually 
share different proteins. The rates of evolution for different proteins might dif- 
fer substantially. So we cannot use any metric (or distance) information across 
different subsets in a uniform way. What we get is just topologic information 
- unrooted trees with £ leaves each. It is clear that smaller i leads to better 
utilization of sequence information. But how small can £ be? Trees on £ = 2 
leaves are simply one edge connecting the two leaves. For £ = 3, there is only 
one unrooted tree on three leaves - the star. So neither £ = 2 nor £ = 3 yield 
informative data. The minimal value of £ which can give rise to an informative, 
input dependent tree topology is £ = 4. For each subset of 4 taxa, there are 
three possible unrooted bifurcating phylogenetic trees. Such quadruple of taxa 
with an associated bifurcating topology is called a quartet. There is one more 
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Fig. 1. The possible unrooted trees for the taxa set {a, b, c, d}. Three bifurcating quar- 
tets (top), and the multifurcating star (bottom) 



topology - the star, which is multifurcating. Figure D depicts these possibilities. 
A quartet can be viewed a partition of the four taxa into two pairs of taxa {e.g. 
{a, 6} and {c, d}). This subdivision expresses the most supported topology, given 
the sequence data common to all four taxa. Such a quartet is denoted ab\cd. 

After the list of quartets has been determined, the goal of quartet based 
reconstruction is to find a phylogenetic tree which maximizes the number of 
“satisfied quartets” . Combinatorially, the justification for the quartet based ap- 
proach is that if all (^) quartets are given with the right topology, then the 
underlying tree is uniquely determined, and can be efficiently constructed. Of 
course, in reality there are errors, so a tree consistent with all given quartets 
may not exist. In the next section we define the problem exactly, and discuss its 
computational complexity. 



4 Problem Description and Complexity 

The problem is defined over a set of n taxa, numbered 1, . . . , n. The input consists 
of a set of k such quartets. We denote the associated taxa and partition for the 
j-th quartet by ajbj\cjdj. No two quartets share the same set of four taxa. 
Each input quartet is accompanied by a positive weight, denoted by Cj, which 
represents the confidence in the quartet topology. We will first describe how 
these confidence values are computed. 

After each four-taxon phylogenetic tree is inferred, a measure of reliability or 
confidence for this topology is assessed. The confidence value is a real number 
in the range [0,1]. It should reflect two factors: how solid is the plurality of 
sequences supporting the quartet topology (i.e., the strength of the phylogenetic 
signal), and what is the size of the sequence population used to build the tree. 
Bootstrap na is a common method of computing the confidence values. Random 
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positions of the original sequences are chosen independently (with repetition) 
to generate sequences whose length equal the original length. For each such 
re-sampling, a phylogenetic tree is computed. The bootstrap value equals the 
fraction of these reconstructions that yield the original quartet topology (notice 
that this value can be lower than 1/3). 



4.1 Specifications 

The output in the quartet based reconstruction problem is an unrooted tree with 
n leaves, which are labeled by the input taxa. Given a tree T and a quadruple 
{a,b,c,d}, we can compute the quartet topology induced by T, using the fol- 
lowing procedure. First, all leaves but a, 6, c and d are deleted from the tree. 
Edges adjacent to these deleted leaves are also removed. Next, internal nodes 
with degree two are contracted and deleted, so their two adjacent nodes become 
connected. This process is repeated until no internal nodes of degree two are left. 
As we observed earlier, there are four possible induced topologies for the quadru- 
ple — the three quartets and the star topology (which can be induced only by a 
multifurcating tree). Given a specific quartet and the induced four taxa subtree, 
we say that the quartet is unresolved if the tree induces the star topology on the 
four taxa. Otherwise, the quartet is either satisfied (if the topology induced by 
the tree equals the quartet’s topology), or violated. 

Given a tree T and a set of quarters Q, we would like to know how well does 
T represent Q. To do this, we find the subset of quartets S C Q that are satisfied 
by T, and the subset of quartets U C Q that are unresolved by T. We now define 
the score of the tree as follows: 

scoreg(T) = X! ^ X! ' 

sGS u^U 

That is, we add the confidence weights of the satisfied quartets, plus one third 
of the weights of the unresolved quartets. This latter term was chosen because 
there are three possible pairings for every quadruple. Therefore this term equals 
the expected increase to the tree score that will result from a random bifurcation 
of the tree (performed at nodes with more than two descendents) . Even though 
we prefer to construct bifurcating trees, we introduce a measure which is “fair” 
with respect to multifurcating trees as well. We are now ready to formulate the 
problem precisely. 

Definition 1. The quartet based reconstruction problem is defined as follows: 
Given a set of quartets Q, together with associated confidence scores, find a tree 
T which maximizes scoreg(T). 



4.2 Complexity 

Suppose we are given a full list of (") quartets over n taxa (one quartet topology 
for each quadruple) . Then it is easy to check if there is a bifurcating tree which 
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satisfies all quartets, and to construct this (unique) tree if it exists. We first 
identify a pair of “sibling taxa”, contract them, update their associated quar- 
tets, and continue iteratively. If we get stuck, the data is inconsistent. However, 
given k < (^) quartets, the decision problem “is there a tree which satisfies all 
quartets” is NP-complete PSD]. Given a list of weighted quartets Q, an upper 
bound on the score of any tree is This upper bound can be achieved 

only if there exists a tree that satisfies all quartets. Given k < ((() quartets, we 
assign weight 1 to each quartet in the list, and 0 to all remaining quartets. The 
existence of a tree which satisfies all k quartets is therefore equivalent to the 
existence of a tree whose score equals k. This immediately implies that maximi- 
zing the score of a tree with respect to a partial (or weighted) list of quartets 
is NP-hard. A careful look at Steel’s reduction from the betweenness problem 
reveals that maximizing the score of a tree is even MAX SNP complete. This 
means that there is an absolute constant e > 0 such that finding a tree whose 
score is at least 1 — e of the maximum score is NP-hard (even if there is a tree 
satisfying all the quartets). 

NP-completeness and hardness results imply that there is little hope to 
find polynomial time algorithms. But such results are asymptotic, applicable 
to “worst case” input instances. In particular, when trying to reconstruct phy- 
logenetic trees, it is worthwhile to consider what is a “typical” problem size and 
what is the nature of “typical” input instances. While one may consider trees 
with hundreds or thousands taxa, it turns out that much smaller values of n 
are still of biologic interest. For example, investigations of mammalian evolution 
often involve instances where n, the number of taxa, is between 15 to 24. With 
such values, “mild” exponential time algorithms may be of practical interest. In 
addition, while available sequence data contain lots of errors, it is reasonable 
to expect that these errors are not designed by an adversary, but are rather of 
some stochastic nature. One may therefore hope that efficient heuristics can be 
effective in approaching optimal solutions on such data. 

Before proceeding to more advanced methods, we point out that the sim- 
ple, exhaustive search methods seem infeasible even for modest values of n 
(say n = 15). The number of unrooted bifurcating trees [11 tij with n leaves is 
(2n — 5)!/ (2"“^(n — 3)!) . For n — lb the number of such trees is just below 
8 X 10^^. For a full list, the number of quartets is ~ 1,300. So an algorithm 
which goes over all quartets for every tree would take more than 10^® steps. 
Performing that many steps is not a feasible task on “reasonable” contemporary 
machines. 

4.3 Rooting the Tree 

Recall that the goal of phylogeny reconstruction is to build a tree which best 
represents the course of evolution. In particular, to understand which events 
occured earlier and which ones took place later, the tree should be rooted. But 
the input to quartet based reconstruction is a list of quartets, which are inhe- 
rently unrooted. How can one hope to construct a rooted tree given unrooted 
information? 
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Indeed, to root the tree we need some auxiliary information, not included 
in the quartets. This information is the identity of outgroup among the n taxa. 
This outgroup is an external taxon, which does not belong to the same family 
as the other taxa. (In our example the outgroup is chicken, while all others are 
mammalian.) If indeed the identification of the outgroup is correct, then this 
taxa was the earliest to diverge from the remaining taxa. We first produce an 
unrooted tree according to the optimization criteria. Then we root the tree by 
forcing the outgroup to be an immediate descendent of the root. 

We note that the problem of generating rooted tree is considered hard for all 
tree reconstruction methods. The use of an outgroup is a standard trick in the 
other methods as well. 

5 Algorithms 

Published quartet methods include the Buneman tree |H1 El? the short quartet 
method m, neighbor-joining variants and quartet puzzling HD. 

See also m for a comprehensive survey, and ^ for a different formulation of 
the problem, termed split decomposition. Most of these heuristics deal with 
unweighted quartets (i.e., each quartet has the same significance). Some also 
require as input the full set of (") quartets. The general maximization problem 
with weighted quartets is MAX SNP complete. However Li et. al. have recently 
obtained a polynomial time approximation scheme for the “dense” version of the 
problem: Unweighted case with a full list of (2) quartets m 

A “geometric” heuristic, based on semi definite programming, and an “exact” 
algorithm, based on dynamic programming, are described in jSj. These two ap- 
proaches are global, and can handle weighted quartets. Due to space limitations, 
we describe in detail only three approaches: Quartet puzzling, the geometric 
heuristic, and the exact algorithm. 

5.1 Quartet Puzzling 

Quartet puzzling applies a simple greedy procedure (called “puzzling step” ) for 
combining quartets into bifurcating trees. To avoid local traps, this puzzling 
step is repeated many times, and the order of the taxa is permuted at random 
in each repetition. Finally, the many resulting trees are combined to one tree 
(which could be multifurcating) by “majority consensus” . 

Consider a run of the greedy procedure, and for simplicity assume the (ran- 
dom) order of the taxa is a, &, c, d, e, /, . . .. The quartet topology for a,b,c,d 
serves as an “anchor” for the tree. A counter is associated with each of the five 
edges in this anchor, and all counters are initialized to 0. Then the next taxa e 
is examined. The taxa e should be added to the current tree by branching off 
one of the existing edges. Suppose ij\ke is a quartet from the input, and i, j and 
k preceded e in the random order, so that these three taxa already appear as 
leaves in the current tree. Consider the internal node O in the tree where the 
three paths i — j, i — k and j — k intersect. Deleting O splits the tree into three 
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disjoint subtrees. In order not to violate the ij\ke quartet, e should branch off 
some edge in the subtree containing k. Branching off an edge from either the “i 
subtree” or the “j subtree” would violate ij\ke. To indicate this violation, the 
counter of every edge in these two subtrees is incremented by 1. The process 
repeats for other such quartets involving e. At the end of the current step the 
counter of every edge in the current tree contains a non-negative integer, indi- 
cating how many quartets involving e will be violated if e is to branch off this 
edge. The edge with minimum count is chosen (if there are two or more such 
edges, one is chosen at random). After e is added to the tree, e does not change 
its place with respect to previous taxa. Then, all counters are zeroed, and the 
next taxa / is examined. This greedy process is repeated until all n taxa are 
examined and placed. 

Both the puzzling step and majority consensus are extremely efficient, so 
quartet puzzling is a fast heuristic with good reported results. As presented 
in m, the quartet puzzling algorithm requires non weighted input of all (^) 
quartets. However, if we update the counter of violating edges by the weight of 
the violated quartet (rather than by 1), weighted input can be handled. Thus, it 
seems that quartet puzzling is applicable to the general, weighted case as well. 
However, to our knowledge no experiments in this direction were done. 



5.2 The Geometric Heuristic 

The geometric heuristic m 155] is a “global” method, which assigns to each taxa 
a point on the boundary of the unit sphere in 7?." . The embedding is built using 
“hints” from the input set. For example, if the quartet ab\cd is in the list, we try 
to place a and b close to each other, but a and d far apart. Once the points are 
embedded in 7?.", a clustering heuristic converts the embedding into a tree. 
Semidefinite Programming: The paradigm of SDP — semidefinite program- 
ming plays an important role in combinatorial optimization and in many ap- 
proximation algorithms (see [2 tip. Let (vi,Vi), denote the inner product of two 
points Vi and vj in TZ^. We can use SDP to efficiently solve (with any desirable 
precision) any optimization problem of the form (the Cij,a[j\b^^'^ are reals): 

Find n vectors v\, . . . ,Vn G TZ^ so as to maximize the quantity 

J2czj{Vi,Vj), 

'i'J 



subject to the constraints 






We will use this formulation of SDP to find an embedding of n points which 
represent the n taxa. 
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Geometric embedding of taxa: We will generally denote the embedding of 
taxon i by the point Vi in T?.". Given a list of k quartets ajbj\cjdj and confidence 
values Cj {j = 1, , k), we solve the following semidefinite program: 

maximize: 

bj) + {cj, dj)) 

—0.5 X/i<j</c g) ^j) ’ G) (^t> ^j)') 

subject to 

{vi,Vi) =1 (1 < t < n) 

The n constraints force the points Vi to the boundary of the unit sphere. The 
quartets’ input is incorporated into the objective function, which embodies many 
local requirements into a single expression. (This approach was motivated by the 
approximation MAX CUT algorithm in .) Consider the quartet ab\cd with 
confidence level C. Its maximum contribution to the objective function occurs 
when a and b are placed at the same point, while both c and d are placed at 
the antipodean point on the unit sphere. In this case (a, b) = (c, d) = 1, while 
(a, c) = (a, d) = (5, c) = {b,d) = —1. The overall contribution of the quartet 
will be 4C with this embedding. The worst embedding places a and c together, 
and b and d at the antipodean point. This will contribute — 4C to the objective 
function. The semidefinite program will therefore look for a global embedding 
which maximizes “good” quartet placements and tries to avoid “bad” ones. 

By experimenting with this approach, it turned out that small variants of it 
are helpful in improving the final result (the produced tree). One such variant 
is ignoring quartets with low confidence in the objective function. Only quartets 
with confidence level above a certain threshold (90% for example) are included. 
One possible explanation why this may prove helpful is that low confidence 
quartets probably carry most of the inconsistencies, and their inclusion may 
lower the value of the objective function. It was also discovered that by imposing 
additional constraints, one gets (small) improvements in the score of the tree. 
For example, forcing the points to maintain some small pairwise distance by 
adding the ( 2 ) constraints 



{vi,Vj) < 1 — e {1 < i < j < n) 

for some positive e (say e = 0.25). This “represses” the SDP’s tendency to 
find embeddings where points are very close to each other. This tendency has a 
negative effect on the tree building method, which we describe next. 
Geometrical Glustering: Having solved the SDP problem, one seeks a tree 
that reflects the geometric data. For instance, if two points reside in the same 
geometric vicinity, they should have a common ancestor which is not too much 
high up the resulting tree. To this end, we employ a simple clustering heuristic. 
The program is initialized with n clusters, each containing a single point. An 
invariant kept by the algorithm is that each cluster has a point in TZ^ associated 
with it. This is the case at initialization, and remains true as new clusters are 
formed. At each step of the algorithm the number of clusters is decreased by 
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one, by removing two “old” clusters and adding a new one. This defines a tree, 
as the newly-added node represents the father of the two deleted nodes in the 
output tree. The selection of the clusters to be removed is done by calculating 
the pairwise distances between points associated with clusters, and selecting the 
pair having the shortest Euclidean distance. The point associated with the new 
cluster is the center of mass of the points of the removed clusters (the “mass” 
of a point being the number of taxa it represents) . When the number of clusters 
reaches one, the tree is completed. The resulting tree is rooted, but we disregard 
the rooting, since the input is inherently unrooted. To root the tree we use the 
identity of the outgroup taxon. We remark that this clustering heuristic is a 
special case of the general neighbor joining approach m- 



5.3 The “Exact” Algorithm 

Since the underlying problem is NP-hard we can not hope to give a polynomial 
time algorithm to solve it optimally. However, in this subsection we show how 
an optimal phylogenetic tree can be found, using a dynamic programming algo- 
rithm, with a “mild” exponential running time |Sl EH] • Tbe method is applicable 
to instances with modest size (say n < 21). 

The following discussion deals with rooted bifurcating trees. For a node v, its 
left and right children will be denoted by vi and respectively. Given a rooted 
tree T and a node v in it we denote by T{v) the subtree of T rooted at v. We 
denote by L{T) the set of leaves {i.e., taxa) of the tree T. For a pair of nodes 
u,v the least common ancestor of u and v, lca(rt,u) is defined as an ancestor p 
of both u and v such that no node in T(p) other than p is an ancestor of both 
u and V. (This definition is extendible to a larger number of nodes.) 

Definition 2. Given a quartet q = ab\cd and a tree T, the quartet least common 
ancestor of q, qlca(g) is defined as a node p that is the lea of two or more pairs 
of elements from {a, b, c, d}, and no node in T(p) except p is the lea of two or 
more pairs of elements from {a, b, c, d}. 

The following equivalent definition is helpful for implementing the algorithm. 

Definition 3. Given a quartet q = ab\cd and a tree T, the qlca of q is a node 
p such that 

1. \L{T{p)) n {a,6,c,d}| > 3. 

2. For any child s of p, \L{T{s)) n {a,b,c,d}\ < 2. 

It is not hard to see that every quartet q has a unique qlea(g). Figure Elillustrates 
two qlea arrangements. The first arrangement demonstrates a case where the 
qlca is different from the least common ancestor of the four leaves. 

Lemma 4. Given a tree T and a quartet q, the subtree rooted at qlca(g) deter- 
mines whether q is satisfied in the tree T. 
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Fig. 2. Two possible arrangements of a qlca for quartets over {a, b, c, d} 



Corollary 5. Given a quartet q = ab\cd and a tree T, let v = qlca(q). Then T 
satisfies q if and only if at least one of the following holds: 

1 . {a,5}CL(T(s)). 

2. {c,d} C L{T{s)). 

Where s is either v’s left child (s = Vi) or v’s right child (s = Vr)- 

The Algorithm: Let Q be a fixed set of input quartets. Let T be a rooted 
tree, and v a node in T. We denote by SATq(T(u)) the set of quartets q G Q 
such that q is satisfied by T, and qlca(g) is a node in T{v). Let TOPq(T(u)) C 
SATq(T(u)) be the set of quartets in Q that have v as their qlca and are satisfied 
by T. The following equality describes a partition of SATq(T) to three disjoint 
subsets 



SATq{T{v)) = TOPq{T{v)) U SATq(T(u,)) U SATq(T(u,)) . (1) 

(The equality follows from LemmaEl) For a set A C Q of quartets, let sum(A) 

denote the sum of their weights. The score of the subtree T{v) (with 
respect to Q) is defined as 

scoreg(r(u)) = sum(SATQ(r(?;))) . 



By Equation (P) 

scoreQ(r(t))) = sum(TOPQ(T(u))) + scoreQ(T(z)^)) + scoi:eQ{T{vr)) ■ (2) 

Let S' be a set of three or more taxa. Denote by opt_scoreg(S) the maximum 
score with respect to Q among all trees that have S as their set of leaves H. We 
denote by opt_treeg(S) a tree which attains the maximum score. For every pro- 
per partition of S into two subsets Si and S 2 , let T(Si, S 2 ) denote a tree whose 

By the definition, trees with one or two leaves do not contain any qlca, so for sets 
S of size 1 or 2 we define opt_scoreQ(S) = 0. 
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left subtree equals opt_treeQ(S'i) and its right subtree equals opt_treeQ(S' 2 )- 
By equation we have 



scoreQ(T(5i, S' 2 )) = 

sum(TOPQ(T(S'i, S' 2 ))) + opt_scoreg(Si) + opt_scoreg(S 2 ) . 

This implies that 

opt_scoreg(S) = (3) 

max (sum(TOPg(T(Si, S 2 ))) + opt_scoreQ(Si) + opt_scoreQ(S 2 )) • 

SiUS2=S ' V V / 

Let (Si, S 2 ) be a partition of S which attains the maximum, then opt_treeg(S) 
is defined as T(Si, S 2 ). 

Equation m yields a recursive algorithm to compute the optimal tree (with 
respect to the given list of weighted quartets, Q): Given Q and S, go over all 
partitions {Si, S 2 } of S, and choose a partition which maximizes 

sum(TOPg(T(Si, S 2 ))) + opt_scoreg(Si) + opt_scoreg(S 2 ) . 

This partition defines which taxa belong to the left subtree, and which to 
the right one. Apply the procedure recursively until each subtree has size smal- 
ler than 3. An optimal tree may be constructed by means of backtracking the 
partitioning steps. 

The drawback of this recursive algorithm is that the score of each set S is 
recomputed whenever S is encountered as a subset in a partition. (Thus if S is 
of size i, its score will be computed times.) In order to avoid this wasteful 
repetition, employ the dynamic programming paradigm. We make a record of 
computed opt_scoreg(5') values, so that we will not have to recompute them. 
To do this, we scan the subsets S C (1, 2, . . . , n} by increasing size of S. This 
guarantees that in the computation for a set S, all subsets of S are already 
scanned over. It can be shown that by employing dynamic programming, the 
running time of the exact algorithm for an input of k quartets over n taxa is 
fc3", and that 0(2") memory is required. 

6 Concluding Remarks 

The geometric algorithm and the dynamic programming algorithm were imple- 
mented in Together with the quartet puzzling algorithm, they were tested 
on real data, corresponding to weighted quartets over 15 taxa - 14 mam- 
malian orders and an outgroup taxon. (For quartet puzzling, the weights were 
ignored.) These quartets were derived from the HOVERGEN database ^31- The 
results are described in Table 0 Running times are measured on a $30K ma- 
chine, in 1997 prices (Sun Ultra-4 at 300 MHz). The fact that the score of the 
optimal tree is only 73.4% indicates that the input is not very reliable. We note 
that despite this fact, the resulting trees seem to make biological sense. 
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Table 3. Results of running the three algorithms on real sequence data 



Method 


Score (% upper bound) 


% Satisfied Quartets 


Running Time 


Puzzling 


71.6% 


68.1% 


10 seconds 


Geometric 


72.1% 


68.7% 


5 seconds 


Exact 


73.4% 


70.2% 


7 minutes 



From the algorithmic point of view, there are a number of interesting open 
problems related to quartet base reconstruction. For realistic datasets, any de- 
crease of the exponent’s base for an “exact” algorithm will be significant. (Cur- 
rently the dynamic programming algorithm takes about 5 days for datasets with 
n = 20 on the same hardware.) A different direction is to prove any performance 
guarantees for either the quartet puzzling or the geometric reconstruction me- 
thods. A tree chosen at random is expected to satisfy one third of the quartets. 
However, we are not aware of any efficient algorithm for reconstructing trees 
from weighted quartets whose output is guaranteed to be above one third of the 
maximum. 
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Abstract. The development of efficient parallel programs requires ex- 
pertise in the application domain as well as deep knowledge of parallel 
algorithms, languages and tools for the construction and execution of 
parallel programs. We present a method to make such expertise availa- 
ble in an domain specific tool set. Its construction is based on extensive 
use of a variety of powerful reuse methods. It automates a large amo- 
unt of the software construction process, such that users need not know 
about parallelism. 



1 Introduction 

The development of an efficient parallel program needs expert knowledge of the 
architecture of the parallel machine and its development tools. The fundamental 
differences of machines with respect to concepts as shared or distributed memory, 
synchronous or asynchronous communication, tightly or loosely coupled proces- 
sors are visible in the programming language, in different abstract programming 
models, and give even rise to different algorithms. Hence, reuse of precoined 
solutions is much more difficult than in the case of sequential programs, where 
software reuse can be based on one programming model and highly portable 
languages. 0 

In our approach a tool set makes the expert knowledge of a specific domain 
available, such that users can characterize their problem instance on a high level 
of abstraction without knowledge of parallelism. The tool set creates efficient 
parallel software using development tools, dedicated libraries and components. 
The tool set is constructed by experts in the particular domain of parallel pro- 
gramming. A variety of reuse methods is extensively used, e. g. libraries of generic 
components, software architectures, generators, manufacturing procedures. 

We understand this approach mainly as a demonstration of an engineering 
strategy, rather than as the development of particular production tools. 

The main aspects of our approach are 

— We aim at automated software construction, rather than at the support of 
interactive development steps. 

^ This work was supported by DFG Sonderforschungsbereich 376: “Massive Paralle- 
fitat: Algorithmen, Entwurfsmethoden, Anwendungen” . 
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— The reuse covers the construction process, and hence, reaches beyond passive, 
composable program structures. 

— We aim at restricted problem domains, rather than at a wide range of parallel 
programs. 

The approach has been applied in the construction of two tool sets, one for 
parallel branch-and-bound programs and one for parallel sorting. A complete 
description of the project and its results can be found in [Sj. The general en- 
gineering strategy and many ideas and techniques have been carried over from 
our construction of a successful tool set for a completely different domain: the 
Eli system for automated language implementation. 

The paper is structured as follows. In section 2 we describe a scenario of par- 
allel branch-and-bound implementation in order to show what kind of expertise 
needs to be made reusable. An overview on reuse methods is given in section 
3. Section 4 introduces the concepts of the tool set. The remaining sections 
emphasize the most effective reuse methods within the tool set. 

2 Development of Parallel Programs 

The development of programs for parallel machines requires knowledge of a va- 
riety of specific topics: the application area, algorithmic methods for a parallel 
programming model that fits to the particular machine, parallel programming 
languages, and machine specific tools for program implementation and execution. 
High quality and efficient software requires expertise in all these areas. Hence, 
it is very desirable to make some knowledge of experts in parallel programming 
available to those who are experts in an application area. In the following we 
consider a more concrete scenario in order to elaborate what kind of expertise 
is needed to develop efficient parallel programs. 

Assume software is to be constructed for an application in the area of plan- 
ning of machine usage: A set of jobs has to pass a linear sequence of production 
stages. Each job needs a specific amount of time in each stage. Each stage is 
equipped with a certain number of equal machines. A schedule is to be compu- 
ted such that the time until the last job is finished is minimized. This problem 
is called a flow shop problem with multiple processors (FSMP) PJ. It belongs 
to the class of NP-hard combinatorial optimization problems in|. However, in 
practice upto medium size problem instances are solvable on massively parallel 
machines using branch-and-bound algorithms. 

A branch-and-bound algorithm explores a tree structured solution space m- 
The nodes represent partial solutions where some of the decisions are made, e. g. 
some jobs are allocated in the schedule. There is a branch function which creates 
successors of a given node by enumerating all possibilities of one more decision 
being bound. Some leafs of the tree represent legal solutions. They have costs 
associated which are to be minimized. The cost function is defined for the inner 
nodes, too, such that it estimates a lower bound of any solution created from 
that node. It is used to refine the most promising nodes first, and not to explore 
subtrees which can’t have a better solution than the best one found so far. 
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At this point we can characterize what the minimal problem specific part of 
the program is: the functions for branching, bounding, and cost computation, 
the data type that implements solution tree nodes, and the application program 
which provides the problem data and uses the result. Note that these program 
parts are not influenced by parallelism. The main part of the branch-and-bound 
implementation does not directly depend on the specific problem. It could be 
subject to reuse. 

The above description sufficiently characterizes a sequential branch-and- 
bound algorithm. However, parallel ones need a much deeper algorithmic con- 
sideration: Distributed worker processes explore sets of subtrees, which are to 
be stored in distributed data structures (priority queues). The efficiency heavily 
depends on suitable load balancing strategies to keep the workers busy without 
disturbing them by unnecessary communication. 

Additional tasks have to be solved for the parallel solution: initialization 
and termination of the worker processes, distribution of the problem data and 
output of the results. Hence, the modular decomposition of the parallel solution 
significantly differs from a sequential one. 

Even on this abstract level algorithms can not easily be reused for any parallel 
machine. They must fit with respect to different parallel programming models: 
shared or distributed memory, synchronous or asynchronous message passing, or 
data parallel concepts. The same holds even more on the level of implementation: 
There are either programming languages specific for a certain programming mo- 
del, e. g. Occam-2 for distributed memory and synchronous communication 
via channels, or otherwise general programming languages like C are extended 
by constructs for a specific programming model. Hence, even if there are libraries 
that support certain tasks, their portability is rather restricted. Interface speci- 
fications like PVM |TD] and MPI 1 14171 provide some standardization at least for 
the message passing model. 

Finally, on the technical level of program installation, execution and debug- 
ging tools are used which are specific for the particular parallel machine. Even 
accessing a parallel machine often requires a significant amount of specific tech- 
nical knowledge. 

The above analysis demonstrates that all aspects considered below the level of 
characterizing the problem specific part of the branch-and-bound method belong 
to the expertise needed for the development of this kind of parallel program. 
Furthermore, one easily recognizes that such expertise can not be made simply 
available by some library of functions. More sophisticated reuse methodes are 
needed. 



3 Software Reuse 

Our approach for construction of domain specific tool sets is based on syste- 
matic application of the most powerful reuse methods in order to make expert 
knowledge available to non-experts. This section gives a brief overview on those 
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aspect of software reuse. A comprehensive survey and taxonomy can be found 
in ini and a status report in EOl. 

Software reuse is the construction of software systems from existing compo- 
nents in contrast to their development from scratch. It is not only code that can 
be reused, but any kind of intermediate product of the development process, e. g. 
specifications and system structures. Even manufacturing procedures or decision 
rules may be subject to reuse. 

Reuse of components does not only decrease time and costs of software de- 
velopment. Even more important, reuse may increase software qualities of the 
products, e. g. reliability, adaptability, and efficiency. That effect can be achieved 
if experts for a particular task develop reusable components of high quality by 
applying state-of-the-art design and techniques. Although reusers do not have 
those knowledge and skills, they can reproduce such quality. Reuse may even 
enable users to solve an implementation task. Take for example a parser ge- 
nerator. It encapsulates sophisticated expert knowledge for the construction of 
powerful and efficient parsers. Most of its users would not be able to develop 
such products, even if they could spend plenty of time. 

In our approach a tool set is built by experts for a certain domain of paral- 
lel programming, and is used by non-experts to create software from high level 
descriptions. In terms of reuse methods such a tool set is considered as an applica- 
tion generator [3|. It reuses complete system designs and creates implementations 
from specifications. In terms of our scenario in section 2, a branch-and-bound 
problem is specified, and a parallel implementation is created from it. 

Application generators have a high potential payoff, with respect to the con- 
ceptual distance spanned from the specification to the implementation, as well 
as in terms of the code expansion (ratio of specification size to program size) 
in|. However, such great leverage is only possible if the application domain is 
sufficiently narrow to allow the automation, and if many different variants of 
software in that domain need to be constructed. 

For building such tool sets several powerful reuse methods have to be applied 
in order to encapsulate a variety of expertise as discussed in the scenario of 
section 2. The software architecture m of systems in the application domain 
is fixed upto a certain degree and reused for each of the created variants. In a 
known modular decomposition of a system the interfaces between components 
can also be fixed. Hence, variants of components can be created independently, 
and a combinatorial explosion of the number of variants can be avoided. Krueger 
characterizes software architecture reuse in nq as large-scale system design reuse 
and also points out the relation to application generators. 

The components of a tool set have to cooperate in a certain preplanned way in 
order to create a complex software system. Such software manufacturing pro- 
cesses can be modeled, programmed and executed by tool control systems like 
Make 0 or Odin j0l4| . Here the knowledge of construction steps, their dependen- 
cies, input/output relations and parametrization, as well as intermediate results 
are reused. The power of this reuse method lies in its programmable activities, 
which even allows to reuse rule based decisions in the construction process. The 
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method belongs to the category of more general software development processes 
where both persons and tools are involved m 

Generic instantiation of software components mainly supports reuse of 
program structures, data structures, or algorithms. Abstract schemas are reu- 
sed on instantiation where the generic parameters are substituted by concrete 
types, functions, statements, or other program constructs. The concept is clo- 
sely integrated in several programming languages, e. g. in Ada, C-|— 1-, Eiffel. 
The technique of macro substitution can yield similar effects, however without 
any checks for consistency. The skeletons of IB] are based on comparable con- 
cepts: They describe parallel algorithmic schemas in the notation of higher order 
functions. 

Finally it should be mentioned that a system of actively cooperating tools 
can itself contain generators which contribute components to the whole product. 

The Eli system is an example for a tool set that is constructed on 

the base of reuse methods as described above. Its domain is the construction of 
software systems for language processing. It can be considered as an application 
generator that consists of many cooperating tools. 

The software architecture of the generated products is a widely accepted de- 
composition model for language processors. Its fixed interfaces guarantee that 
the components created by different tools fit together. Several generators are 
integrated in the Eli system, e. g. generators for scanners, parsers, attribute 
evaluators, and definition tables. There is even a generator which completes the 
concrete and the abstract grammar and creates the parser grammar as input for 
the parser generator. It is clear that such activities form a complex manufactu- 
ring process. It is described for and controlled by the Odin system. 

A library of reusable generic modules is embedded in the system. A module 
implements the solution of a common subtask of language implementation, e. 
g. basic concepts for name analysis according to scope rules as in C. Several 
instantiations allow to have different scope rules for different kinds of identifiers, 
e. g. variables and labels. Such a module consists of specifications for genera- 
tors and program components written in several different languages. Hence, the 
generic parameter substitution has to be done by language independent text 
replacement. 

By this means the Eli system embodies expert knowledge in the domain of 
language implementation, and provides state-of-the-art implementation techni- 
ques for non-experts. Its principles have provided ideas, methods, and techniques 
for the construction of the tool sets for parallel program domains. 

4 Tool Set Structure 

This section describes the overall structure of tool sets for the development of 
domain specific parallel programs according to our approach. We illustrate the 
description using components and properties of BBSYS, the system for construc- 
tion of programs that use branch-and-bound algorithms. However, the concepts 
apply to other application topics as well. 
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BBSYS incorporates expert knowledge for the implementation of parallel 
branch-and-bound programs. The support reaches from mechanisms of acces- 
sing the parallel machine to load balancing strategies for distributed worker al- 
gorithms. The knowledge is encapsulated such that users need not be concerned 
with aspects of parallel solutions. They just contribute information that charac- 
terizes their problem instance, e. g. data types representing branch-and-bound 
solutions. 

From the user’s view the system is organized in four levels (Fig. QJ). They 
represent increasing levels of abstraction of support for parallel program deve- 
lopment from access to the parallel machine upto software creation in the specific 
application domain. 



constructs variants in his application domain 



develops sequential 
S / algorithm components 


domain specific models 
generators, libraries 


u / 

^ / uses precoined solutions 

in parallel programs 


parallel algorithmic solutions 

Branch-and- Bound 


develops programs with higher 
parallel computer interface 


parallel basic services and -algorithms 

Priority Queues (PPBB) 


develops program 
for parallel computer 


basic techniques for parallel computers 

compilers, access, execution, basic libraries 



parallel programming languages parallel computers 



Fig. 1. Conceptual levels of the tool set 



The lowest level (1) abstracts from techniques that are specific for the particu- 
lar parallel machine, e. g. compilation, configuration, and execution of programs, 
and allocation of processors. The tools on this level are scripts that call com- 
pilers and other components of machine specific platforms. Standard libraries 
for message passing, are also allocated on this level. At present several different 
platforms are supported, e. g. workstation clusters. Transputer systems, and the 
SIMD computer MasPar MPl. On this level mainly manufacturing procedures 
are reused. 

Level 2 contributes utilities, parallel algorithms, and data structures which 
are typical for the particular application domain. In case of BBSYS those are 
load balancing libraries and parallel implementations of priority queues, like the 
PPBB library P2|- Hence, we have reuse of precoined domain specific standard 
components. 
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Level 3 is the algorithmic level. Here the modular decomposed components of 
the algorithmic solutions are allocated - in our case for branch-and-bound algo- 
rithms. Variants for different problem characteristics and different programming 
paradigms are organzid in a library. The library modules have generic parameters 
which are substituted on instantiation by problem specific data types and opera- 
tions. The structure of the resulting program is determined by a domain specific 
software architecture for branch-and-bound programs. Hence, on this level users 
select module variants from the library to be used for certain components in the 
software architecture and supply generic parameters for the particular program 
instance. This level hides the parallel solution from the user. His contributions 
can be made in terms of sequential branch-and-bound concepts, i. e. branching, 
bounding, and cost functions, and a data type for solutions. We here have reuse 
of generic schemes of algorithms and data structures within a fixed software 
architecture. 

The topmost level is intended for non-algorithmic application domains. They 
are suited for this approach if software in that domain has to be constructed in 
many variants, and if a subtask has to solve complex problems that require paral- 
lel machines to be used. Software for planning machine usage in large production 
environments could be such a domain. That level usually contains generators 
which create programs from descriptions in domain specific languages. On this 
level users in general need not program at all. In BBSYS this level is not used. 

5 Domain Specific Software Architecture 

The tool sets are constructed to produce software variants for many problem 
instances in their application domain, in this case branch-and-bound programs. 
The requirements usually vary in several dimensions, e. g. a minimizing or a 
maximizing problem, compute one or all solutions, apply depth- first or best-first 
search. The consequences may affect different parts of the product. Hence, it is 
extremely important to have a suitable modular decomposition that is applica- 
ble for all variants. Then consequences of a requirement can be localized. For 
example, the software architecture holds for all target machines. For a parti- 
cular one specific modules are selected for certain components in the software 
architecture. Thus, the construction of variants is kept manageable. 

Fig.Elshows the software architecture for BBSYS. The boxes represent modu- 
les, the arcs show the use relation. The central concept of a parallel branch-and- 
bound implementation is realized in the worker module together with the data 
module for distributed work load. Variants are created for example by selection 
of alternative modules for load balancing depending on the network topology of 
the target machine. 

The modules above the shaded part represent the part of the application 
program which supplies the problem data and uses the solution from the branch- 
and-bound algorithm, as well as the problem specific data types and functions 
required by the generic instantiations. They consist of sequential code and are 
provided by the user. 
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Fig. 2. Software architecture for parallel branch-and-bound 



Of course, most of the modules are generic. They are instantiated with generic 
parameters, like data types for problem instances and solution representation, 
with functions for branching, bounding, cost computation, etc. We use a very 
general language independent mechanism for generic instantiation. It replaces 
generic parameters just textually. It allows any kind of program construct to 
be substituted on instantion, e. g. types, statements, expressions, or complete 
functions. The module designer has to use this technique carefully, because there 
are no checks for consistency of parameter substitution as in more restricted 
genericity of programming languages. 



6 Software Manufacturing 

Our tool sets build software for parallel machines. The manufacturing process 
itself is rather complex, involves several different tools, and many intermediate 
products. It incorporates expert knowledge which is reused in order to automate 
the construction. It may be considered as a work- flow model for specific software 
development which in our special case is executed without human interaction. 

A typical example for a rather sophisticated process is contained in the lowest 
level of our tool sets. It comprises the steps necessary to install and execute a 
program on a SC-320 Transputer system using the INMOS-C-Toolset. Fig. 0 
illustrates the process for a sorting program that consists of two C modules and 
a description of the process topology (in the top row) . The latter is processed by 
a generator, which creates configuration information to be added to the compiled 
and linked C-modules, and to be used to allocate processors on the machine. 
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User 




Fig. 3. Reuse of a manufacturing procedure (INMOS-C-Toolset) 



Another situation where active manufacturing steps are needed in our tool 
set is the selection of one of several algorithmic variants, e. g. of a load balancing 
module, on the base of problem characteristics and machine properties. 

The complete manufacturing process throughout all levels of our tool set 
is modeled for the tool control system Odin. A so-called derivation graph is 
formulated in Odin’s specification language. It describes the inputs and outputs 
of each tool call and the data flow through the system. Odin’s interpreter uses 
that graph to determine which steps, with which arguments are to be executed in 
which order for a given request. The intermediate products are kept in a cache. 
They are reused for a subsequent request, if the products they depend have not 
changed. This is one of Odin’s advantages compared with the Unix tool Make, 
which serves similar purposes. 

The reuse of manufacturing processes as described in this section is typical 
for our approach. Its effects can not be achieved by other powerful reuse methods 
which are based on passive program structures, e. g. skeletons, object-oriented 
hierarchies, or frameworks. 
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7 Program Generators 

Program generators are one of the most powerful reuse methods. They incorpo- 
rate the knowledge of how to create a solution from a declarative description of 
a problem instance. 

As an example we consider a tool that creates configuration files for par- 
allel programs. The network structure of machines like the SC-320-Transputer 
systems can be adapted to fit well to the process structure of the program, in 
order to save time for routing steps. In order to do this the INMOS-C-Toolset 
requires to supply three configuration files, the desired processor net, the pro- 
cess net from the view of the program, and a mapping between both. As the file 
contents are closely related and the notation is rather clumsy for being written 
manually, a generator is well suited for that task. It creates the three files from 
a single annotated graph, which is drawn using a general purpose graph editor. 

This generator raises the level of abstraction significantly from the low level 
operational notation of configuration files upto a graph notation that also has 
a visual representation. Furthermore, the generator checks for structural consi- 
stency. This generator supports the design of almost arbitrary process nets. That 
facility is not used in BBSYS. The generator is integrated into an earlier tool 
set of ours which supports program development for communication structures, 
that can be specified explicitly. 

8 User Interface for Software Configuration 

Using a tool set like BBSYS can be considered as a software configuration task: 
The user provides several items of information, which describe properties of the 
particular problem instance, state requirements on the results, and contribute 
program fragments for specific data types and functions. The items are related 
to each other, and they have to be consistent and complete with respect to a 
set of rules that drive the software construction process. Those rules and the 
design of the input information structure encode expert knowledge of correct 
and effective usage of the system. It is embedded in a configuration program 
and its graphical user interface. 

The configuration program is generated by the tool LaCon from a description 
of the structure and the kind of information, and from rules for consistency and 
completeness. According to the description it creates a window hierarchy with 
suitable input components and an implementation of the rules and of a help 
mechanism. Initially LaCon has been developed as a tool for specifying variants 
of domain specific languages mm- However, it is sufficiently general to be used 
as a generator for configuration programs in any domain. 

The input items are organized in a hierarchy of windows. Each window com- 
prises information of a certain topic. Fig. 0 shows the root window and two 
windows from levels below it. The hierarchy expresses the level of abstraction 
of the information: Upper levels are expressed in terms of branch-and-bound 
concepts, whereas lower levels allow to influence the created software more di- 
rectly. Information on those detailed levels can be omitted, defaults are then used 
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(b) Main menu 




(a) General specifications 



(c) Target machine 



Fig. 4. Configuration with graphical user interface 



instead. Users may completely abstract from the parallel branch-and-bound im- 
plementation. 

The input components are generated from a small but sufficient number 
of graphical widgets for different kinds of information: e. g. a binary decision 
whether a minimizing or maximizing problem is stated, a number for the branch 
degree if it is fixed, a selection out of several target machines. Program fragments 
like a type or a function are contributed using dedicated editors. They present 
frames to be filled in for the particular item. 

Fig. 0 shows an example for the violation of a consistency check. It is taken 
from a detailed specification level. The selected local heap strategy does not 
fit to the request for backtracking mode. The error description is automatically 
generated from the violated rule. 

9 Conclusion 

We presented a method for construction of tool sets which support the deve- 
lopment of parallel programs in specific domains. Expert knowledge of efficient 
use of parallel machines, languages, programming models, algorithmic methods. 
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Fig. 5. Consistency violation 



and on the software manufacturing process is made available for non-experts in 
parallel programming. 

Our approach is based on powerful methods of reuse to automate software 
construction: A domain specific software architecture is the basis of compositio- 
nal reuse. Variants are created by selecting and instantiating generic modules 
from a library. The manufacturing process is modeled for and executed by a 
tool control system. A generated configuration program with a graphical user 
interface guides users towards consistent and complete specifications on a high 
level of abstraction. 

The approach has been demonstrated by a tool set for parallel branch-and- 
bound programs. It has been successfully used for the construction of flow shop 
scheduling software. The tool construction process has been repeated for ano- 
ther algorithmic domain, parallel sorting. It is left for future work to apply the 
approach to further domains, and to get even closer to applications by domain 
specific generators on the topmost tool level as described in section 4. 
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Abstract. The tight connection which exists between the fragment of 
Prolog now known by the name Datalog 1211 and the various calculi and 
algebras for Relational Database Systems was observed at several places 
in the late 70-ies and early 80-ies. The problem was to make this idea 
operational and to build a system which implemented it. Such systems 
today are known as Deductive Databases. 

We describe the history of a hardly known project from the mid 80-ies 
where a prototype realizing this goal was produced. We explain why the 
Relational Database system called Business System 12, developed by 
IBM in the Netherlands, and which became operational in 1983, turned 
out to provide the right functionality. We also indicate how this project 
influenced subsequent projects aimed at enhancing the degree of decla- 
rativeness in interfaces with database systemsQ 



What we understand we can formalize; what is formalizable can be automated, 
and you will discover that someone has built it. 



1 The Great Idea 

Deductive Databases arose in the 1980-ies out of the confluence of three techno- 
logies which came to age by the end of the 1970-ies. First, by 1980 the Relational 
Database had become the preferred model of database technology. Secondly, due 
to the initial success of Expert Systems, Artificial Intelligence was returned to 
grace. Finally, with the proliferation of the language Prolog the concept of Lo- 
gic Programming became a prominent activity in Artificial intelligence, if not in 
computer science in general. 

People observed that, notwithstanding the substantial differences in concep- 
tual frameworks, these three approaches to conceptual information processing 

^ Evidently the actual developments in the fields of Databases, Logic Programming 
and Artificial Intelligence go far beyond the small part which the authors observed 
and participated in. The observations and opinions expressed in this paper therefore 
don’t pretend to present a complete view on history, but rather to reflect the insights 
and positions held by the authors during these developments 
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were storing and combining information in similar ways. Thus the idea arose to 
substitute the computational strategies from one field for the slower methods 
used in another. For example one might use the database engine from a Relatio- 
nal Database system in order to compute the results which would be produced 
by backtracking in Prolog. 

Such a cross fertilization of technologies held great promises for all fields 
involved. Databases were much more powerful when crunching raw data than 
interpretation based Prolog systems. Secondary benefits from Database techno- 
logy, like security, sharing and persistence, would become available to AI almost 
free. On the other hand database technology would benefit from the declarati- 
veness of Prolog, where one can introduce new concepts and use them in the 
same way as concepts stored in the system directly. It would also provide a mo- 
nolingual alternative for the rather heterogeneous ad hoc formalisms used for 
providing structural information about data and constraints to the database. 
That at the procedural level the match would be less than perfect (e.g., the or- 
der in which answers would be produced would be different in the two worlds), 
was something one could live with. 

In this section we give our perspective on the state of the art in Databases, Ai 
and Logc programming around 1982. With a familiar example we show how re- 
lations are expressed in the three formalisms. Finally we explain our compilation 
based approach to building a Deductive Database. In section 2 we describe our 
project from 1984, where a fragment of Prolog indeed was compiled onto an exi- 
sting database system. How the functionality of our prototype can be extended 
is discussed in the third and final section. 

The reader should note that a number of terms naming and identifying con- 
cepts relevant to our project (terms like Deductive Databases and Datalog) still 
had to be invented in 1984. Using these phrases in our paper therefore is in fact 
an anachronism. 



1.1 The Theater 

On the left side of the scene a team of engineers is programming a huge main- 
frame machine; on the other side a group of logicians is writing strange looking 
formulas on a blackboard. . . . 



Relational Databases. The main cause for the success of the Relational Da- 
tabase model as proposed by Codd jS| is its clean and conceptual semantics over 
a well understood mathematical model: relations are subsets of Cartesian pro- 
ducts of Domains. Relations can be used both for describing entities (objects) by 
listing their characteristic properties as a tuple of values, and conceptual relati- 
onships between such entities. The conceptual world from the system designer 
(as represented in an entity-relationship scheme 0) can be mapped to a Re- 
lational Database scheme, consisting of domains, attributes, tables and various 
constraints. This mapping is well understood and forms a basic ingredient of 
elementary database courses. 
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Relations in mathematics are used to represent the meaning of predicates in 
first order predicate logic (standard Tarskian semantics) , so it is natural to use 
first order logic as a language for talking about the contents of databases. By 
1980 the appropriate fragment of first order predicate logic had been described. 
This fragment today is known in various syntactic dialects by the names domain 
calculus or tuple calculus. The core of the prominent language SQL is a dialect 
of tuple calculus. 

These calculi can be used to give a declarative description of derived relations 
in terms of primitive relations stored in the databas^. Derived relations are 
evaluated by subjecting the primitive relations to suitable algebraic operations 
which live in the mathematical structure known by the name relational algebra. 
Well known relational operations are, select, project, union, product, (natural) 
join and difference. However, a single description may correspond to a large 
variety of equivalent expressions in the relational algebra, all yielding the same 
result, but with possibly different processing costs. It is the task of a query 
optimizer to produce, for a given logical or algebraic expression, an equivalent 
one with optimal processing costs. 

An important shortcoming of the Relational Database model is the lack of 
abstraction mechanisms at the level of domains: what for one observer is a primi- 
tive value may turn out to be a composed structured object for another. In the 
relational model attribute values are selected from domains consisting of atomic 
values only. The need for composite structured values has resulted in the nested 
relational model. Computer Industry however has moved in a different direction: 
in the world of Object Orientation composite structured objects are the norm, 
and by making them persistent the Object Oriented Database is obtained. Ho- 
wever, by taking this road all the secondary functionality of databases had to 
be recreated, and along the way the nice features of the relational model, par- 
ticularly those related to its clean mathematical semantics and declarativeness , 
where lost. Object-Relational Databases m represent an attempt to preserve 
the best of both worlds. 

These semantic issues on how to represent information and how to express 
queries, should not make us forget the prime purpose of a database system: 
providing a secure, stable and persistent environment where a community of 
users may concurrently access, query and update large volumes of shared data. 
Database technology primarily is concerned with the technological problems of 
realizing these “secondary” features. 



Artificial Intelligence and Logic Programming. In this paper an Arti- 
ficial Intelligence project is a project involving some form of logical inference 
by symbolic means. Other important activities in AI like natural language pro- 
cessing, image processing, robotics, knowledge representation and extraction or 
computational research on models of the human mind will be disregarded. 

^ today, in Deductive Databases the terms intensional and extensional are used for 
expressing this distinction 
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Researchers in symbolic inferencing and theorem proving in the early de- 
velopment of AI, used first order predicate logic as a universal representation 
language. Resolution based theorem provers were used and obtained sometimes 
reasonable results. Resolution is a strategy for theorem proving which hardly 
resembles the strategies used by humans. For a full treatment of the strategy I 
refer to standard textbooks like mM- The strategy requires several steps. The 
logical expression of assumptions and the (negated) conclusion are transformed 
into a collection of clauses build from literals having terms as arguments. The re- 
sulting list of terms is subjected to an iteration of unifications, substitutions and 
resolution steps, aiming for a derivation of the empty clause. If this succeeds one 
obtains a refutation proof showing that the negated conclusion is inconsistent 
with the assumptions thus proving the required result. If the resolution proof 
fails the substitutions computed can be used for constructing a countermodel 
for the conclusion. 

The bad news is that the resolution strategy is highly nondeterministic: inde- 
finite clauses lead to a branching structure of cases which all should be explored. 
Moreover, iterated substitutions on terms occurring in the clauses may lead to 
a combinatorial explosion. Worst of all, the method is incomplete, as some sa- 
tisfiable formulas turn out to posses only infinite models, which will never be 
searched out to completion in a resolution proof. 

In many practical situations this combinatorial explosion doesn’t occur, 
which makes actual work in AI feasible. People saw a reason why this hap- 
pened. Frequently the formulas used belong to the fragment known by the name 
Horn clauses, clauses with at most one positive literal. This property reduces 
the number of available literals on which a clause may be resolved to at most 
one, and thus one of the important sources of the combinatorial explosion is 
removed. The nondeterminism of the resolution method can be replaced by the 
deterministic depth first search backtracking strategy known today by the name 
SLD resolution. The language Prolog, which originated out of a natural language 
processing project but which to a large extent was founded on Kowalski’s work 
on resolution theorem proving, actually has SLD resolution for the Horn Clause 
fragment as its computational model. jO] 

Using SLD resolution the combinatorial explosion due to the growth of terms 
remains. However if we remove from Prolog the function symbols in terms (to- 
gether with many features residing outside the computational core like build-in 
predicates having side effects) we obtain the “pure” core language known today 
by the name Datalog. 

One of the claimed advantages of the language is its declarativeness: a Prolog 
program is a list of first order formulas which express the structure of the model 
to be constructed, just written in a rather unconventional dialect of predicate 
logic. On the other hand there exists a Prolog interpretator which actually builds 
the model described by the rules, and the user doesn’t have to be bothered 
about the details on how the interpreter obtains its results. Note however that 
there is a major semantic difference with the interpretation of logic used by 
ordinary mathematicians: in predicate logic a set of formulas describes a family 
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of models, whereas the Prolog interpreter produces a preferred minimal model. 
For people working in AI this difference was no problem at all, since they were 
looking for minimal models as interpretations anyhow. Curiously enough such a 
minimal model is also taken to be the natural interpretation for the contents of 
a (relational) database. 

A major contribution to the popularity of Prolog was the selection around 
1982 by the Japanese of this language as vehicle for their Fifth Generation 
Project, and the impact this choice had on research in computer science around 
the world. 

Logic programming is the Computer Science research area where the above 
intuitive notions are investigated and formalized. Notwithstanding the fact that 
the field dates from the 1970-ies it lasted until the early 1990-ies before a com- 
plete satisfactory mathematical correct formalization of the theory of logic pro- 
gramming which was free from the errors caused by the interaction between 
syntax and semantics, appeared in the literature. 

1.2 The Players 

In the centre of the scene a group of people is having a picnick. They are discus- 
sing how they are related to each other. . . 

We stick to the traditional example of family relations among humans to illust- 
rate the similarities and differences between the various Computer Science fields 
mentioned above. 

Consider the statement that Dorothy is an aunt of Jessica. This can describe 
two different relations: it may be the case that Dorothy is a sister of one of 
Jessica’s parents, or it can mean that Dorothy has a husband Fred which is a 
brother of one of these parents, thus becoming Jessica’s uncle. It can even mean 
both relations at the same time. Sisterhood is a defined relation as well. Dorothy 
is a sister of Thomas, since they share parents and are not the same person; 
moreover Dorothy is a female person. Parenthood and marriage are relations 
which are not defined in terms of other relations, and the same holds for the 
gender, date of birth and name of people. Existence of a set of entities called 
people is assumed, together with the possibility of identifying people by name. 
These notions are considered to be basic notions when talking about family 
relations. A further distinction is that gender, date of birth and name are unary 
properties of people, whereas parenthood and marriage are binary relations. In 
fact marriage has a hybrid nature: as a relation it connects two people, and at 
the same time it is an entity in itself having properties like a date of marriage 
(and possibly a date of termination). 

First order logic is a convenient language to express such relations. In this 
particular case we might write something like: 

parent{Gerti, Jessica) A sister{Gerti, Dorothy) aunt{Dorothy , Jessica) 

or preferably, since it is a general rule, using logical variables 

VG, J, D[parent{G, J) A sister{G, D) aunt{D, J)] 
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a similar rule for sisterhood reads: 

yX, Y, Z[parent{Z, X) A parent{Z, Y) A X ^ Y A female{X) sister{X, y)] 

Information about the primitive relations in logic is expressed by atomic 
assertions like: female(Gerti) or par ent{Gerti, Jessica). 

The corresponding rules written in a Prolog become: 
aunt(D,J) parent(G,J)> sister(G,D) 

sister(X,Y) :- parent(Z,X) ,parent(Z,Y) ,X yf Y, female(X) 
female(X):- person(X,DFD,Any) . 

showing that the order of the implication is reversed, and that the quantifiers 
are omitted (by convention free variables are understood to be universally quan- 
tified) . The basic information is presented by means of facts which are stored in 
the Prolog database which in fact is just a collection of lines in the program: 
personCGerti , DFD , 19240713) 
marriage (Fred, Dorothy, 19411013) . 

which illustrates that the unary properties of entities are written in a format 
resembling database tuples, rather than using a collection of unary predicates. 

In a Relational Database model people will be represented by a (stored) da- 
tabase table whose structure can be expressed by a structure declaration like: 
PERS0N(NAME:STRING(20) ,GENDER:{DMD,DFD},BIRTH:NUM(8) # KEY(NAME) ) 
Gerti now can be represented by a tuple like (Gerti ,F, 19240713) in the PERSON 
table. Other base relations describing parenthood and marriage, are represented 
by relations as well: 

PARENT(PAR: STRING (20) , CHILD: STRING (20) # KEY (PAR, CHILD) ) 
MARRIAGE(HUSB:STRING(20) ,WIFE: STRING(20) ,DATE:NUM(8) 

# KEY(HUS,DATE) , KEY (WIFE, DATE) ) 

and by inserting the appropriate tuples the basic family relations are described. 

Having provided the required facts and rules we hope that our computer 
system indeed will be capable to derive the fact that Gerti is an aunt of Jessica. 

In Prolog one may submit a query like: ?- aunt (Dorothy, Jessica) hoping 
that the system will return the answer DYESD. The Prolog interpreter, using its 
SLD resolution, will first discover that Gerti is a parent of Jessica, and subse- 
quently attempt to solve the subgoal of establishing that Dorothy and Gerti are 
sisters, which eventually will succeed by invoking the rule for sisterhood, but not 
before having inferred along the way that Dorothy is female. 

Extracting the same information from the relational database would require 
to invent an algebraic or logical relational expression representing the aunt re- 
lation. We might describe the sister relation by: 

SISTER(A,B) = 

SEL( PRDJ(( PARENT(X,A) JN PARENT(X,B) , (A,B) ) , A yf B ) 

JN PR0J( SEL (PERSON (A, G, Arb) , G = DFD, (A) ) 
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and this complicated expression must be joined with the parent relation to ob- 
tain one of the two contributing relations for aunthood. Once the view definition 
for aunthood is constructed one can evaluate it on the database hoping that in 
the resulting table the tuple (Dorothy , Jessica) will show up. If the data- 
base contains a large number of people it might even be advantageous to sub- 
mit a query selecting the tuple (Dorothy , Jessica) from the view describing 
aunthood: the database optimizer may push the selection down in the algebraic 
expressions thus reducing the size of the intermediate results and the processing 
time for the query. 

Observe the distinction between the definition of this derived sister relation 
(called a view definition) and the table produced by evaluating it in the database 
(the view instantiation). 

This example illustrates what declarativeness is all about: in Prolog we have 
specified the rules and the system navigates towards a solution. In a Relational 
database we have to do a lot of thinking before we can submit the right query 
to the database. 



1.3 The Play 

Disturbed by the fierce debate the engineers and logicians drop their activities 
and join the family, in an attempt to solve the dispute. . . 

In the above example the Prolog approach offers a reasonable degree of declara- 
tiveness. However, inspection of how the Prolog inference process will proceed if 
run against a substantial fact database, will show that the computational pro- 
cess can be quite inefficient. Moreover, there is no actual database system against 
which the Prolog system operates; the Prolog system contains a fact database 
stored explicitly in the program. Consequently none of the secondary benefits of 
database technology like persistence, security and sharing will be available. 

Declarativeness is also a property ascribed to the Database Interface language 
SQL. Since SQL is claimed to be relationally complete it should be possible to 
formulate an equivalent of the definitions given in Prolog. However, the resulting 
expressions will become far more complex than the ones given in the previous 
section. Worse, in the version of SQL available around 1982, due to the lack of 
a Union operator, it would have been impossible to express the aunt relation 
by means of an expression, and one would need a process for evaluating this 
relation. 

Even if the database interface language is truly relationally complete the 
problem remains how to obtain the required relational expressions. In Prolog, 
as in mathematics and logic, a complex definition is expanded into a chain of 
more simple definitions, terminating at basic notions. The corresponding feature 
in relational databases would be a view definition being expanded into a chain 
of other view definitions, ultimately arriving at the tables stored directly in 
the databases. This requires view definitions to be invokable as easily as stored 
relations. Yet another feature which the database systems around 1982 didn’t 
offer 120]. 
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Finally the translation itself. Life will remain hard if every Prolog defini- 
tion must be translated by hand into a corresponding relational expression. The 
translation, however, is not hard to automate. The strategy how to perform it 
can be found in textbooks like m- Yet there remains the difference between 
knowing how to do it in principle and actually building a system doing it. Such 
automated systems are now known by the name Deductive Databases. 

The ideal of Deductive Databases is to combine the user friendly declarative 
world of Prolog with the information crunching power of existing relational da- 
tabase systems. Isolate a fragment of Prolog which can be evaluated by means 
of relational algebra on a database system, automate the translation, and build 
a system which uses as much computational power from the database engine 
as feasible. Next enjoy using for free the secondary benefits of the database sy- 
stem which you no longer have to reinvent for your AI system: sharing data, 
concurrency, transaction management, security and so on. . . 

The core fragment of Prolog for which this translation exists is known by 
the name Recursion- free Safe Datalog. This fr^ment consists of rules describing 
predicates in Prolog defined by a non-recursivqj system of Horn clauses, starting 
from a finite collection of basic relations. Terms do not contain function symbols. 
The rules moreover satisfy a safety condition preventing the occurrence of free 
variables in the answer as given by the Prolog interpreter. This condition is 
required in order to keep the derived relations bounded. For such rules the defined 
predicates have an extent which can also be described by algebraic relational 
expressions having base tables corresponding to the basic relations as atoms. 
The translation process moreover can be mechanized. 

As illustrated by the extensive literature on the topic Logic and Databases 
around 1980 this semantic relation itself was reasonably understood. Yet most 
projects in this area worked on the basis of the so-called tuple at a time approach. 
The AI system no longer needs to store its fact database inside the program, 
if the resulting facts can be represented by tuples in a relational database. But 
the combinatorial processing of this information is still performed at the side of 
the AI inference engine and therefore the power of the database system itself is 
severely under-utilized. None of the secondary benefits of database technology 
becomes available to the AI system. 

One may wonder why so few projects in the early 1980-ies went beyond this 
tuple at a time approach. As we conjecture, the lack of an appropriate relational 
database system may have been an important cause. . . . 



2 The Prolog - BS12 Interface Project 

Inspired by the hype provoked by the Japanese Fifth Generation project and 
its challenge to the rest of the world m the first author, using her experience 
as a participant for several years in the development of IBM’s product Business 

^ The full language Datalog allows for recursive definitions also; in the last section of 
this paper we explain how such recursive rules can be treated in a compilation based 
deductive database. 
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System 12, noted the correspondence between backtracking in Prolog and eva- 
luating expressions in a relational database. Initial ideas on this approach were 
presented at a workshop at the Free University in Amsterdam, early 1984 CZI 
This workshop was dominated by proponents of the tuple at a time approach 
resulting is disbelief among the audience. Still convinced of the suitability of 
BS12 as a vehicle for realizing this idea, she arranged a student internship at 
IBM for C.J.F. Doedens, who developed a working prototype within 9 months. 
His report H2] constitutes the master thesis on which he graduated in Compu- 
ter Science at the University of Amsterdam. See also m- The best accessible 
document describing the project is m- 

Since this paper focuses on the general ideas and historical backgrounds of 
the project we will not present a complete technical description of the translation 
process. The theory needed for constructing such a translator can be found in 
textbooks like Ullman m Instead we discuss the requirements on an RDMS 
in order to be a suitable platform for realizing a compilation based deductive 
database system. BS12 indeed satisfies these requirements. 



2.1 Business System 12 

Business System 12 is the name of a Relational database system developed by 
IBM, the Netherlands in the early 1980-ies, building on the work of a group at 
Peterlee, UK. During the early stages of development there was little interaction 
with database development projects going on in the USA like system R. Blauw 
and Duyvesteijn from the University at Twente served as advisors. A first version 
of BS12 became operational in 1983. The system was offered as a service from 
the International Network in Zoetermeer in the Netherlands. There was never 
an installation delivered to clients; users accessed the system in a time shared 
environment. IBM management at some point in the 1980-ies decided no longer 
to support the product, after which the system gradually disappeared. However, 
a major application running on the system was a financial project build around 
the XSHARE database storing stock exchange rates of some 100000 companies 
over a period of five years, resulting in the (for this era tremendous) volume of 
5GB! The fact that it turned out to be impossible to migrate this application 
to DB2 without loss of performance and functionality was the main reason why 
BS12 actually survived until approx. 1996. 

It is difficult to find in the literature information about BS12; the reference 
manual P| contains a full description but was made available primarily to users. 
A more superficial description of the functionality in the style as advocated 
in [251 constitutes the unpublished document PI- A paper on concurrency issues 
appears in the proceedings of a Dutch national conference P| . Andrew Warden, 
in reality Hugh Darwen, one of the designers of BS12 mentions the system with 
hindsight in f29] . 

In BS12 tables are relations, subsets of Cartesian products of domains, where 
columns are identified by named attributes; the rows correspond to the tuples 
in the relation. Domains are finite sets relating to one of the five basic types: 
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Character, Numeric, Name, Bit, Timestamp. The basic type Name re- 
presents a sort of strings with a normalization convention. It is the datatype used 
for table and attribute names. To every domain a default value can be assigned. 
Duplicate rows in a table are disallowed. Tables are homogeneous: all rows have 
the same attributes and domains. The ordering of rows in a table is insignificant. 

The environment where the system had to operate: a time sharing environ- 
ment where several customers are using the database system simultaneously, had 
severe consequences for the system architecture. Since competitors might share 
the use of the system they had to be protected against each other, both with 
respect to accessing information and the denial of service. Consequently, having 
a global catalogue of tables and/or views was unthinkable. Every user had access 
to his/her private dictionaries with the possibility of opening items for shared 
use. 

A second important feature is the monolingual structure of the system inter- 
face. BS12 focusses on tables, being relations interpreted as sets. But processes, 
functions etc. are represented by objects which are also tables {language tables) 
which makes these objects accessible to relational operators. System information 
like the catalogues of tables, columns, or views are also stored in tablesfl 

A third important feature is the fully algebraic character of the system. The 
number 12 in the name BS12 refers to the presence of 12 relational operators: 

— SELECT set comprehension on a table based on a selection criterion (subset 
of rows) 

— PRESENT a hybrid combining projection and renaming (subset of columns) 

— CALCULATE extends a table by a new attribute which functionally depends 
on the others where the dependency is given by an explicit formula 

— GENERATE production of a single row relation computed from constant values 

— SUMMARY a aggregate construct used for grouping, summation etc. 

— JOIN the natural join 

— MERGE a version of the outer join where nonmatching row are extended by 
default values rather than nils 

— QUAD Cartesian product of relations 

— DIFFERENCE relative set difference 

— EXCLUSION symmetric difference of two tables 

— INTERSECTION 

— UNION 

The Boolean operations (the last four in the list) do not require that the two 
argument relations are union compatible; if they are not the resulting table is 
projected on the attributes shared by both operands. 

BS12 is algebraically complete: wherever a base relation can serve as an ope- 
rand in an expression its role may be performed by a relational expression as 
well. Note that the result of evaluating an expression in this context always re- 
presents a relation considered as a set. Every relational expression in the algebra 

This makes it possible to construct the “paradoxical” Russell view consisting of those 
views which don’t contain themselves as a row. 
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represents a view on the database. This view can not only be evaluated but the 
expression defining the view can be stored in a language table, which itself can 
be added to a user owned catalogue called the Table of Views. Moreover these 
definitions can be stored and modified during a session. 

As we will indicate in the sequence both the algebraic completeness and the 
monolinguality are instrumental in making BS12 a suitable platform for the 
construction of a deductive database system. 



2.2 The Translation of Safe Non-recursive Horn-Clauses into View 
Definitions 

Given a system of Horn-clause rules for a family of predicates, and given a 
collection of basic predicates used in these rules we want to interpret this systems 
as a definition of a collection of predicates in terms of the given basic ones. 
Assuming that the basic predicates correspond to stored tables in a relational 
database we want to obtain for the derived relations relational expressions in 
terms of the basic tables with the intended meaning. 

In a preprocessing stage the clauses are ordered such that clauses sharing 
a predicate in the head are grouped together. The group defining a predicate 
which occurs in a subgoal in the body of some clause should precede the group 
containing this clause. Consequently clauses all whose subgoals invoke predicates 
corresponding to base tables or build-in predicates appear at the beginning of 
the list. If the system of Horn clauses is not recursive such an ordering can be 
obtained by topological sorting. For recursive families of Horn clauses such an 
ordering doesn’t exist. 

One additional complication may lead to meaningless expressions. The com- 
plication involves the occurrence of variables in the head which don’t appear in 
the body, or only appear in build-in predicates. Semantically this means that 
the values assigned to the corresponding attributes in the relational tabel for 
this predicate have no designated domain and thus give rise to potentially unbo- 
unded relations. The problem is illustrated by rules like: mortal (Everybody) . 
or larger (X,Y) (X > Y). 

The easy way out is to prohibit such unbounded relations by enforcing a 
so-called safety condition on Prolog programs^ The condition stipulates that 
in each rule all variables occurring in the head occur bounded in the body: 
Constants and variables occurring in a basic predicate or a predicate earlier 
in the order of definitions in the body occur bounded in the body. The same 
holds for variables occurring in a build-in predicate whose semantics enforces 
this variable to be functionally dependent on other bounded variables. All other 
variables are unbounded. 

The translation of Horn-clauses into view definitions as described by Ull- 
man consists of three stages. 



® Our safety condition follows UDI; it is slightly more general than the one given by 
Ullman m, who requires equality in stead of functional dependence. 
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In the first stage a relational expression is constructed for every subgoal based 
on a relational predicate; the build-in subgoals are treated separately during the 
second stage. These relational expressions basically consist of the relation or 
view representing the predicate in the subgoal. However, due to the occurrence 
of constant arguments or multiple occurrences of a single variable the SELECT 
operator has to be invoked. 

In the second stage the various expressions corresponding to the subgoals are 
joined together to yield an expression corresponding to the extent of the body of 
the rule. Build-in predicates (in this context primarily (in)equalities on numeric 
arguments) are translated to another select. Moreover, in order to accommodate 
for the positionally specified arguments in the subgoals in a relational algebra 
based on named attributes, frequently the expression corresponding to a subgoal 
frequently will have to be renamed. RENAME is also essential in order to arranged 
that shared variables between subgoals will correspond to common columns in 
a JOIN expression. 

In the third and last stage the contributions of the various rules for a predicate 
must be put together to provide an expression which defines this predicate. A 
UNION of the subgoal expressions, after a PROJECT on the variable occurring in 
the head of the rule will suffice in case these subgoal expressions are union- 
compatible and have the same signature as the head predicate. However, due 
to the occurrence of repeated variables and constants in the head of individual 
rules the subgoal expressions in general are not union compatible. However, the 
semantic impact of such occurrence patterns in the head of a rule can also be 
expressed by extending the body with a few additional build-in subgoals. Ullman 
uses a preprocessing stage called rectification of rules which solves this problem. 
The impact is that the expressions for the body will include some more select 
conditions. 

From the above description we can infer the requirements on the functionality 
to be satisfied by the RDMS in order to support a Datalog compiler as intended. 

1. The translation requires that the relational operators SELECT, PROJECT, 
JOIN, RENAME and UNION can be used in expressions in an arbitrary nested 
way. Consequently, with a system like the earlier version of SQL which re- 
quires to form a Union by means of a process rather than an expression life 
will be quite unpleasant. . . 

2. If systems of rules, rather than rules for a single predicate have to be trans- 
lated conveniently, one needs to be able to invoke defined views on the same 
positions as basic tables. Views must be first class citizens. 

3. In order to be able to add, remove and modify rules during a session the 
corresponding views must be inserted, deleted and modified in the database, 
without having to restructure the entire database after every change. This 
requires that the catalogue storing views and their definitions should be 
readily accessible and updatable during runtime. 

As indicated before BS12 fully satisfied these three requirements. The PRE- 
SENT operator combines the functionality of a project and a renaming operator 
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and the other three required operators: SELECT, JOIN and UNION are available. 
In the real world, the SQL based systems available in 1984 violated them all (a 
situation which, in the more recent SQL2 standard, is hardly improved since). 
This illustrates both why BS12 was the ideal RDMS for building our prototype 
and also explains why other groups never opted for a fully compilation based 
approach to deductive databases. 



3 The Aftermath; Extensions of the Functionality of the 
Compilation Based Approach 

Full Datalog includes a feature not covered by the translation process as descri- 
bed in the previous section: Recursive definition of predicates. Since Deductive 
Databases were rather popular in an academic environment where recursion is 
a well understood mechanism it was the first feature one wanted to add to the 
interface. 

Full Prolog uses function symbols for building complex terms which are used 
for representing complex entities. These structures show a similarity with the 
structures in the nested relational model. A second extension involves the mi- 
gration of our compiled based approach to a nested relational model. 

Prolog itself supports the use of negated subgoals, by means of the negation 
hy failure strategy. A much milder form of negation is offered in the relational 
database model by the DIFFERENCE operator. Therefore one wants to investigate 
whether some “tame” use of the negation operator can be added to the interface. 

Finally, having numerical values in a Database is hardly useful if you can’t 
compute on these values, preferably by invocation of symbolic expressions. One 
would like to see the functionality offered by spreadsheet programs to appear in 
a database system where the numeric constraints are described symbolically in 
the database scheme. This extension of functionality is investigated in the field 
called Constraint Logic Programming today. 

Between 1984 and 1997 we have been involved with research projects aimed 
at the above extensions. Recursion and (nonrecursive) terms has been processed 
on our prototype system. For the negation the solution which is described in 
theory |2Z! evidently could have been implemented on our prototype, but we 
never did so. Finally, the extension with constraints became the focal point of 
the RL-project, which was initiated at the IBM San Jose Research Centei0 in 
1985 and which was continued at the University of Amsterdam. 

In this section we will discuss the first two extensions, since they were inve- 
stigated in our project. For the negation problem we refer to the presentation of 
the problem and its solution for the class of Stratified Programs in Ullman 
With respect to the constraints we emphasize the tight connection between our 
project and the subsequent RL-project; for more technical details on RL and its 
prototype implementation see lam. 



now known as the IBM Almaden research center 
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3.1 Beyond Relational Algebra: Recursion 

Recursive definitions of predicates in Prolog are as little a part of logic as recur- 
sive definitions are allowed in the branch of mathematics called Algebra. They 
violate the basic principle in mathematics that defining an object in terms of 
itself is disallowed. Yet logicians, computer scientists and Mathematicians have 
learned to understand and operate on the notions defined using recursion. These 
“definitions” better be understood as equations, and equations may have a solu- 
tion. In fact, in order to be meaningful these equations better have a unique, or 
at least a preferred solution. As it turns out this frequently happens to be the 
case (at least in those situations where indeed recursive definitions nowadays are 
commonly used, since otherwise. . . ). 

The existence of a preferred solution in computer science often can be based 
on the semantic principle attributed to Knaster and Tarski, known as the least 
fixed point principle. If interpreted over some abstract mathematical structure 
f2 a recursive definition can be interpreted by means of an equation of the form 

A = ^{X) 

where denotes some functional over 17, a mapping from 17 into itself. Assume 
that 17 is a structure on which some partial order C is defined, such that there 
exists a unique minimal element T S H, and such that least upper bounds for 
countable increasing chains are defined: if Xq E Ai C X 2 ... is a countable 
increasing chain then [J^o denotes this least upper bound. 

The functional <P is called monotonous if A C T implies that ^(A) C ^(T)- 
The functional <l> is called continuous if it satisfies 

The Knaster Tarski principle states that for a monotonous continuous func- 
tional the equation A = d>{X) has a minimal fixed point, which moreover can 
be computed by U^o where (T) denotes n-fold iteration of on 

T. 

This principle has been used to assign meaning to recursive procedures in 
programming theory with great success 0, where it forms the theoretical foun- 
dation of the Scott induction rule. 

For the semantics of Prolog rules, and its interpretation in the relational da- 
tabase model, we take for 17 the family of relations within a (fixed) Cartesian 
product of domains, where E denotes set-inclusion and T denotes the empty rela- 
tion. It is easy to see that the five relational operators invoked in our translation 
process: SELECT, PROJECT, JOIN, RENAME and UNION, all are monotonous and 
continuous, and so are the complex expressions produced by our translator. 

In applying this principle one has to be aware that in general the use of recur- 
sion in Prolog invokes a set of predicate symbols rather than a single recursive 
predicate. After translation this would define a collection of relations in terms 
of themselves. However, by forming the Cartesian product of these relations, the 
system can be replaced by a single relation having a recursive definition, after 
which the Tarski Knaster principle can be applied. 

Hence we know what the meaning of a recursively defined relation should be. 
Moreover, the Knaster Tarski principle tells us how to compute this meaning. 
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Starting with the empty relation we repeatingly submit the result obtained so 
far to the functional until no new tuples appear in the result. Where in the 
theoretical model this might require an infinite number of steps, in the database 
the process is guaranteed to terminate after finitely many steps, due to the fact 
that after all the domain 17 is finite. Thus increasing chains must become stable 
at some point, and it is easy to see that the first chain element Xi satisfying 
Xi = ^{Xi) indeed equals the minimal fixed point. 

From a practical perspective this evaluation method (the so called naive 
evaluation scheme) is highly inefficient since it will recompute the same tuples 
over and over again. The semi-naive evaluation scheme is an attempt to prevent 
some of these recalculations by rewriting the functional in such a way that only 
the “incremental part” is computed. 

Much effort has been invested in building interfaces on databases capable of 
dealing with recursive definitions. In our project I2DI we have performed some 
experiments with recursive rules using a compilation based approach. Rather 
than resubmitting the intermediate results as a table to the functional we compile 
a series of increasingly complex expressions representing the result of k-fold 
application of the rule and compiling it in relational algebra. This approach was 
enabled due to the architecture of BS12 where the user has access to his view 
definitions in the form of language tables. Furthermore the optimizer in BS12, 
which operated on some tree representation of the algebraic expressions, worked 
in such a dynamic fashion that the recursive expansion of the expressions and the 
optimization worked together in an incremental fashion. This made it possible 
to optimize these expressions at construction time. Once more the architecture 
of BS12 enabled us to investigate an approach followed by few others. . . 



3.2 Beyond Datalog: Terms 

Where Datalog supports as arguments in predicates only constants and variables, 
full Prolog allows terms build by application of functions to a number of terms 
also. Semantically a term violates the first normal form condition in databases 
where all attribute values have to be atomic values. 

A typical example of a pair of Prolog facts illustrating terms is given by: 
owns (Lucy, Mug) . in combination with owns (Lucy, book (Homer, Ilias)) . 

These two facts show that Lucy owns a mug for which no structure informa- 
tion is available, and a book with given author and title. In a Database property 
of the first sort is easy to represent; just store a tuple (Lucy, Mug) in the OWNS 
table. Property of the second type requires that one introduces another table 
BOOK for storing information about books, and subsequently storing in the OWNS 
table a tuple (Lucy,X), where X is something like a, foreign key value referring to 
the BOOK table. At a slightly higher level of abstraction one would consider this 
value X to be an object identifier. But then we are leaving the relational model 
proceeding towards something like a nested relational or fully object oriented 
model. 
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This example illustrates a rather evident strategy for bringing Prolog terms 
within the scope of our compiled approach. However, several problems remains 
to be solved: 

— Storing a foreign key or an object identifier requires some mechanism for 
generating these values, and since in the Prolog world these values are not 
required it becomes the task of the database system to do so. It is questiona- 
ble whether the database will be able to figure out whether it is attempting 
to create a new object identifier for a tuple which it has seen already before: 
duplicate removal becomes a far more serious issue, like it is in the world of 
Object Orientation. 

— Inventing a domain containing the entities which can become property is non- 
trivial since property can both be unstructured and structured. In Prolog 
this is no problem since Prolog is an untyped language. 

In 1986 S.J.C. Elbers, a student in Mathematics at the University of Am- 
sterdam, was given the assignment for his master thesis to construct at IBM 
Uithoorn an extension of the prototype of our Prolog-BSI2 interface which would 
incorporate terms uni- We knew that the first problem would be easy to solve 
given the fact that BS12 in its internal structure generates object identifiers 
(called pseudokeys) for all tuples anyhow. It was just a matter of making these 
pseudokeys available to the user of the system. The developers at IBM at that 
time turned out to be willing to design the required patch on its system for the 
purpose of our project (keeping in mind that it might be of use for themselves 
as well. . . .) 

The second problem was solved by systematically replacing columns by a 
collection of three columns, thus supporting the choice between an atomic value 
or an object identifier for all attributes. For atomic values the first column stores 
the value and the remaining two obtain a default value. For object identifiers 
the first column is a default value and the second and third column contain a 
table reference and a pseudokey value for that table respectively. 

With this expansion of tables it turned out to be possible to extend our 
compiler in such a way that nonrecursive terms could be dealt with resulting in 
queries which provided the right answer. However we stumbled on yet another 
problem, this time having to do with Prolog itself. 

Prolog, in its syntax hardly discriminates between functions and predicates. 
In the above example we might add a fact expressed by book(Vergil, Aeneas) . 
This would be stored as an entry in the BOOK table as well. But a subsequent 
query for the existing books: ?-book(X,Y) would now return the result con- 

sisting of the Aeneas only. Due to the fact that the Ilias is mentioned only as 
a term unification with the predicate would fail for the Ilias. Evidently in our 
database table storing books this distinction would be lost. 

In order to preserve this (counterintuitive?) behavior of the Prolog system 
Elbers had to divide each relation in the database into two sections, one cor- 
responding to the predicate information and another representing the so-called 
argument information. 
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The resulting system became rather complicated, but it worked. Proving 
its correctness became however hard. Elbers has argued for its correctness by 
modeling his database design into Prolog yielding a Flat Prolog version of a 
given program. Next one could show that the Flat Prolog version of the program 
returns always the same answers as the original one. Subsequently, showing that 
the compiler preserves the meaning of the flat prolog program was not hard. 

In this extension recursive terms were disallowed; they could give rise to 
cyclic structures which would force the database system into an infinite loop if 
traversed. 



3.3 Constraints and Elimination: The RL-Project 

Having numerical attributes in a database creates the wish to compute on such 
numerical values. Standard databases provide a restricted support for doing so. 

On the one hand there are simple calculate operations where a new numeric 
attribute value is added to each tuple in a relation according to an explicitly 
specified formula (as implemented by the CALCULATE operator in BS12). The 
calculate operator doesn’t belong to the core of relational algebra as required for 
relational completeness. It is unclear whether relations extended by a calculate 
can be subjected to further algebraic operations. For example: is it permitted 
for a user to perform a select on a attribute value which is computed to be the 
sum of two others? 

On the other hand one has so-called aggregate operators where a table is 
grouped on the base of one attribute values and for each resulting group some 
other attribute is summed up to a total. The collection of aggregation operators 
which may be invoked is fixed and depends on the database system selected: 
summation, maximum, minimum, counting and sometimes logic operations like 
exists and forall. These operators don’t belong to relational algebra and are in 
general not expressible in first order logic. The main reason for aggregates to 
exist in systems is that they are needed in practical applications like generating 
reports for companies. Needless to state that in most systems the user is not able 
to define his own aggregate operators (Stonebraker’s Illustra being a notable 
exception) . 

As far as support for computations is offered it lacks one feature which is 
essential for databases: reversibility . When designing a database scheme one 
doesn’t yet know what sort of queries will be submitted to the database. The 
selection which attributes will be submitted as input values in a query and which 
attributes will be asked for output is made when the query is submitted, and 
at that time the system will have to invent a navigation strategy through the 
available tables in order to generate the answers. 

However, a calculate instruction is always understood to be evaluated in 
the direction given by the definition. Most systems lack the intelligence needed 
to infer that given attributes A,B,C, where C is calculated by C = A + B, the 
output value of B can be obtained from the input values of A , C by rewriting the 
calculate into B = C - A. 
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A more complicated situation involves systems: given the calculate instruc- 
tions C = A + B and D = A - B, our knowledge on elementary mathematics 
learns us that we can compute the values of A,B from those of C,D, but we need 
a symbolic equation solver in order that some system can discover this fact. 

In Prolog the situation is about as bad. The build-in predicate sum(A,B,C) 
only can be invoked to obtain a binding for C once bindings for A and B have 
been given. Symbolic equation solving is out of question. 

Nowadays there exist extensions of logic programming which provide this 
additional functionality (the field of Constraint Logic Programming), but aro- 
und 1986 not much work had been done. Moreover, the research done aimed 
at extending Logic Programming with constraint capabilities, and not to bring 
constraints within the scope of database technology. Yet such systems would be 
of great value for applications describing a physical system like an electronic 
circuit, a business application computing the impacts of various tax regulations, 
or a physiological model describing the blood circulation in a human body. In 
such models there are far more quantities in the model than actually will be 
stored in a table of data. Some quantities are fixed constants which preferably 
are written symbolically, some indeed are observable and known, some are un- 
known but wanted, and there are values which are unobservable, unknown and 
unwanted which simply arise as intermediate values in the equational model. 

Full declarative reversibility means in this context that one describes the 
model by giving the system of equations, and submitting the information about 
the partition in known, wanted and intermediate variables at a later stage. The 
system must invent an evaluation strategy to compute the wanted variables from 
the known ones, eliminating the intermediate ones during the process. 

In 1985 the second author held a visiting scientist position for eight months 
at IBM San Jose, as a member of a research group on Office Automation chai- 
red by Peter Lucas. The aim was to establish a repository for business rules, 
to be accessed by multiple application programs, similar to the way a Data- 
base stores raw data. Business rules were to be understood to involve primarily 
algebraic numeric constraints (Austrian Social Security regulations being a pro- 
totype example). It became the author’s task to invent a language for expressing 
contents of the repository. The resulting proposal for a language called RL ap- 
peared in HS|. This report presents a syntax for rules, and a semantic model 
for their meaning. The model was highly influenced by our experience from the 
Prolog/BS12 interface. 

Rules could be given in three formats: Tabular rules correspond to relational 
database expressions (view definitions); Clauses correspond to Horn-clauses in 
Datalog and Constraints represent the algebraic constraints on symbolic attri- 
butes. Both Syntax and Semantics are based on the principle Everything is a 
relation but not all relations are equal. For example both tabulars and clauses 
describe named relations whereas the constraints describe an unnamed “world” 
relation. Tabulars are given as a program-wide global definition, whereas new 
clauses for an existing predicate can be given. Adding a clause behaves seman- 
tically as a union, whereas adding a constraint behaves as an intersection. 
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Reception of the proposed language was negative. Implementation of the 
ideas was expected to require at least three years. No activities in that direc- 
tion took place at IBM. In fact the Office Automation project was killed several 
months after we left IBM. Still the RL project was revived at the University of 
Amsterdam. For his master thesis in Computer Science S.J. van Denneheuvel had 
written a symbolic equation solver in Prolog, to be used in some entirely unrela- 
ted project. Learning about RL he became interested in actually implementing 
it, and his equation solver turned out to be exactly the ingredient needed to do 
so. Construction of a small prototype with the intended functionality required 
only a few months. A more extended implementation and the related theory be- 
came the topic of van Denneheuvels ph.d. thesis. Papers describing the language, 
its implementation and an example of its use in a medical model are !iniiii‘A‘i 

It is relevant to observe that this system once more is compilation based. 
A query in RL is analyzed by the symbolic equation solver, and subsequently 
translated into a select-join-calculate-project expression which can be submitted 
to a database system. BS12 would have been an ideal system to serve as a 
database back-end, but by the time the project arrived at this point, IBM in the 
Netherlands had lost all interest in experimenting with BS12 (the official policy 
at this time was to retire BS12 as soon as possible. . . ) 

After completing his thesis van Denneheuvel build a front-end to his system 
which compiled RL/1 to SQL. This tool turned out to be useful even in the 
case no constraints were used, since the RL/1 code for a database application 
is far more compact than its SQL equivalent. The language was used as a rapid 
prototyping tool in the early stage of a major development project of Syllogic, 
a high tech software corporation in the Netherlandfl Its awesome performance 
made the system useless for real applications in practice. 

The last RL related project involves an Object Oriented extension called 
OORL which has been investigated in the years 1994-1997 by dr. E. Rotterdam 
as a postdoc at the University of Amsterdam. The report ^3] describing this 
language which offers a declarative window on the inherently imperative world 
of Objects has not yet been finalized. 

Conclusion 

The projects we were involved with in the period 1983 until 1997 have proven that 
a compilation based approach to deductive databases is, in theory, possible. We 
never got to the point to determine whether the approach would yield acceptable 
performance for real life applications; this would require far more investments 
than we were able and motivated to spend. We also discovered that, starting 
from the clean mathematical model for the intended semantics, functionality 
from areas beyond datalog could be added, preserving a reasonable amount of 
transparency of the languages used. 

These projects were made possible by circumstances which allowed to disre- 
gard the pressure to follow the main stream approaches investigated elsewhere. 

^ now a wholly owned subsidiary of Perot Systems 
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Moreover we found in BS12, a system which had been developed under a similar 
climate of independence, an ideal tool to perform experiments with our ideas. 
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Abstract. Classical mathematics is a source of ideas used by Computer 
Science since the very first days. Surprisingly, there is still much to be 
found. Computer scientists, especially, those in Theoretical Computer 
Science find inspiring ideas both in old notions and results, and in the 
20th century mathematics. The latest decades have brought us evidence 
that computer people will soon study quantum physics and modern bio- 
logy just to understand what computers are doing. 



1 Introduction 

Twenty years ago graduates of my University used to say: ’’Why we were taught 
calculus and algebra? They are not at all needed in programmer’s work.” Of 
course, this saying characterized their jobplaces more than the profession. The 
main goal of my today’s talk is to show that contemporary Computer Science 
needs very much of knowledge which is not considered as Computer Science. 

1 might mention Quantum Physics, Biology but I will concentrate mainly on 
Classical Mathematics. I will try to show that Classical Mathematics has much 
to say to Computer Scientists. I even risk to predict that the next generation of 
Computer Science students will learn much more Classical Mathematics rather 
that the students now in the University. 

My talk consists of several stories connected only in one way. All these stories 
show that Classical Mathematics has much to offer to us. Mostly, this is a survey 
of known results. However Theorem 01 seems to be new. 

2 Models of Computation 

Can we presume that human brain is purely deterministic? Perhaps, not. In- 
deed, assume from the contrary that it is. In this case the following paradox of 
responsibility arises. If I do something wrong, what can I be responsible for? 
All my reactions have been genetically preprogrammed in my organism, and 
nothing depends on me. However, if somebody believes I am still responsible 
for something, then this person assumes there is more in my brain but simple 
deterministic reactions to the environment. 

* Research supported by Grant No. 96. 0282 from the Latvian Council of Science 
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Theoretical Computer Science has long ago developed a system of notions 
generalizing determinism. N ondeterministic and probabilistic algorithms were the 
first generalizations. 

Nondeterministic machines constitute an important part of the Theory of 
Computation. However nondeterminism does not give and it is not able to give 
any mechanism of how to perform choices. 

Probabilistic algorithms were first used during WWII for simulation and nu- 
merical analysis. Probabilistic Turing machines were introduced by de Leeuw et 
al. in |ZE|. The main result in this paper shows that every function computed 
by a probabilistic machine can be computed by a deterministic machine as well. 
This result seems to have a devastating effect on our discussion about the na- 
ture of human brain. If human brain can be considered as a computing device 
with unresricted computational resources, then by m a probabilistic brain can 
compute no more functions than a deterministic one, and our consideration of 
the brain as a probabilistic device does not help us to solve the responsibility 
problem. The probabilistic brain also cannot be responsible for anything because 
the reactions however probabilistic are nonetheless genetically preprogrammed. 

M. Rabin |21| introduced probabilistic 1-way finite automata. They can be 
described by stochastic matrices A{i) of size nxn corresponding to every letter in 
the input alphabet (where n is the number of the states of the finite automaton), a 
stochastic column-vector ^ (being the probability distribution among the states 
in the starting moment) and a 0-1 row-vector y (being the description which 
states are accepting and which states are rejecting). By A{x) we denote the 
product of the matrices j4(fg) . . . A(i 2 )A{ii) where i\A 2 , ■ ■ ■ is are the symbols 
in the word x. The input word x is accepted if and only if 

x^e>2- 

M. Rabin m considered seperately the case of bounded away probabilities 
(he called this case isolated cut-point) when there is a positive number S such 
that for arbitrary input word x either 

xM >\ + ^ 

or 

xM 

and proved that 1-pfa with a probability bounded away from ^ can recognize 
only regular languages, i.e. the same languages recognized by 1-way deterministic 
finite automata. If human brain can be considered as a computing device with 
maximally restricted computational resources ( a 1-way finite automaton), then 
the same conclusion is obtained by M. Rabin’s result m- 

There has been much work done to find where the advantages of probabilistic 
algorithms over deterministic ones lie. R. Freivalds m proved that palindromes 
can be recognized by a single-tape probabilistic Turing machine in time n log n 
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while every deterministic machine of this type needs I7(n) time to recognize pa- 
lindromes. After the famous probabilistic algorithms for primality testing by R. 
Solovay and V. Strassen m the topic of probabilistic algorithms became increa- 
singly popular. This caused increasing interest in comparison of capabilities of 
probabilistic, deterministic, nondeterministic machines as well. R. Freivalds m 
proved that probabilistic 1-way multitape and multihead finite automata can 
recognize languages non-recognizable by deterministic or nondeterministic auto- 
mata of these types. Similar advantages of probabilistic automata and machines 
over their deterministic counterparts were proved in for 2-way fi- 

nite automata, 1-way counter, pushdown automata, and 1-way Turing machines 
with limitations on running time, space, reversals, etc. However the compari- 
son of capabilities of probabilistic versus other types of machines is strongly 
limited by the well-known difficulty to prove good lower bounds for complexity 
of concrete (non-diagonal) languages. For instance, we know that probabilistic 
multitape Turing machines with arbitrary number of work-tapes can recognize 
in time 2 n languages not recognizable in time n m but, on the other hand, we 
are not able to separate any running time less than " from time 2 n for the 
same class of machines in spite of highly sophisticated techniques used PSI ■ 

Recently a new type of algorithms has appeared, namely, quantum algo- 
rithms. Nobel prize winner physicist Richard Feynman asked in m what ef- 
fects can have the principles of quantum mechanics, especially, the principle 
of superposition on computation. He gave arguments showing that it might be 
computationally expensive to simulate quantum mechanics on classical compu- 
ters. This observation immediately leads to a conjecture predicting enormous 
advatages to quantum computers versus classical ones. R. Feynman left open 
even the crucial question whether or not quantum computers can compute any 
functions non-computable by classical (deterministic) computers. P. Benioff jSI 
gave early arguments on Feynman’s problem, and later D. Deutsch HU intro- 
duced the commonly used notion of the quantum Turing machine and proved 
that quantum Turing machines compute exactly the same recursive functions as 
ordinary deterministic Turing machines do. 

Quantization has been introduced by Max Planck in 1900 m- Planck assu- 
med a discretization of energy. That was a bold step in a time of the predominant 
continuum models of classical mechanics. 

Mathematicaly all quantum mechanical entities are represented by objects 
of Hilbert spaces. A Hilbert space is a linear vector space over the field of com- 
plex numbers (with vector addition and scalar multiplication), together with a 
complex function for the scalar product. It seems that there have never been sa- 
tisfactory explanations why complex numbers are used (but not, say, quaterions 
or more exotic number fields). The simplest explanation might be that physicists 
have developed such a theory, and it works, while theories based on real numbers 
only do not explain all the experiments. 

Quantum mechanics differs from the classical physics very much. It suffices 
to mention Heisenberg’s uncertainty principle asserting that one cannot measure 
both the position and the impulse of a particle simultaneously precisely. There 
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is a certain trade-off between the accuracy of the two measurements. Another 
well-known distinction of quantum mechanics from the classical physics is the 
impossibility to measure any object without changing the object. 

The fundamental atom of information is the quantum bit, henceforth abbre- 
viated by the term ‘qbit’. 

Classical information theory is based on the classical bit as fundamental 
atom. This classical bit, henceforth called chit, is in one of two classical states 
t (often interpreted as “true”) and / (often interpreted as “false”). In quantum 
information theory the most elementary unit of information is the quantum bit, 
henceforth called qbit. To explain it, we first discuss a probabilistic counterpart 
of the classical bit, which we call here pbit. It can be t with a probability a and 
/ with probability (3, where a + P = I. A qbit is very much like to pbit with the 
following distinction. For a qbit a and /3 are not real but complex numbers with 
the property ||a|p -I- \\P\p = 1. 

Every computation done on qbits is performed by means of unitary operators. 
One of the simplest properties of these operators shows that such a computation 
is reversible. The result always determines the input uniquely. It may seem to 
be a very strong limitation for such computations. Luckily this is not so. It is 
possible to embed any irreversible computation in an appropriate environment 
which makes it reversible. For instance, the computing agent could keep the 
inputs of previous calculations in successive order. 

The following features of quantum computers are important (but far from 
the only characteristic features of them) . 

— Input, output, program and memory are represented by qbits. 

— Any computation (step) can be represented by a unitary transformation of 
the computer as a whole. 

— Any computation is reversible. Because of the unitarity of the quantum evo- 
lution operator, a deterministic computation can be performed by a quantum 
computer if and only if it is reversible. 

— No qbit can be copied. After the qbit is processed, the original form of it is 
no more available. 

— Measurements may be carried out on any qbit at any stage of the compu- 
tation. However any measurement destroys the information. More precisely, 
the measurement turns a qbit into a classical bit with probabilities dependent 
on the qbit. 

— Quantum parallelism: during a computation (step), a quantum computer 
proceeds down all coherent paths at once. If managed properly, this may 
give rise to speedups. 

Quantum finite automata were introduced twice. First this was done by C. 
Moore and J.P. Crutchfield jS2|- Later in a different and non-equivalent way these 
automata were introduced by A. Kondacs and J. Watrous m- 

The first definition just mimics the definition of 1-way finite probabilistic only 
substituting stochastic matrices by unitary ones. Since now complex numbers are 
involved, multiplication to the row matrix x is substituted by a non-linear opera- 
tion ’’squaring modulus of the amplitudes corresponding to the accepting states 
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and totaling these real numbers” denoted below by B^{). The definition is as 
follows. 1-way quantum finite automata can be described by a unitary matrix A 
of size n X n (in the case of automata with n states) and a complex column-vector 
^ (being the amplitude distribution among the states in the starting moment) 
and a 0-1 row-vector x (being the description which states are accepting and 
which states are rejecting). The input word x is accepted if and only if 

BxiM) > 

The second definition is a bit more complicated. The states of the automaton 
are divided into halting and non-halting ones. The transition to a new state is 
made exactly as prescribed by the first definition but if the automaton reaches 
a halting state, the final measurement is made immediately. 

The relation between capabilities of quantum and other finite automata is 
not yet completely described. 1-way quantum finite automata recognize only re- 
gular languages (hence they cannot do more than deterministic automata) but 
the quantum automata cannot recognize all the regular languages. For instance, 
the language {0, 1}*1 is not recognizable by any 1-way quantum finite automa- 
ton |2Z|. On the other hand, for some languages quantum automata can have 
much less complexity. It is proved by A. Ambainis and R. Freivalds |2| that, 
for arbitrary prime number p, there is a quantum 1-way finite automaton with 
O(logp) states recognizing the language ’’the length of the input word is a mul- 
tiple of p ” with a bounded probability of success while every deterministic and 
even probabilistic 1-way finite automaton needs p states. 

It is well-known that if a language can be recognized by a probabilistic finite 
automaton with a probability | (or any other number strictly exceeding |), 
then for arbitrary e > 0 the same language can be recognized with a probability 
1 — e. For quantum finite automata it is not so. A. Ambainis and R. Freivalds 
0 show an example of a language which can be recognized by a QFA with a 
probability 0.65 but which cannot be recognized with a probability exceeding 
0.9. The key-word used in these proofs most often is Fourier transform. 

Quantum automata might remain a lesser known unusual modification of 
the standard definitions but two events caused a drastical change. First, P. Shor 
invented surprising polynomial-time quantum algorithms for computation of di- 
screte logarithms and for factorization of integers |S2|. Second, joint research 
of physicists and computer people have led to a dramatic breakthrough: all the 
unusual quantum circuits having no classical counterparts (such as quantum hit 
teleportation) have been physically implemented. Hence universal quantum com- 
puters are to come soon. Moreover, since the modern public-key cryptography 
is based on intractability of discrete logarithms and factorization of integers, 
building a quantum computer implies building a code-breaking machine. 

The above-mentioned features of quantum computers seem unusual and hence 
one may think that their advent is highly unlikely. On the other hand, in the 
recent years physicists have performed series of crucial experiments showing that 
that all the basical elements needed for quantum computers can be indeed im- 
plemented. A quantum computer with 1 qbit memory has been built in IBM 
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Almaden Research center and a quantum computer with 4 qbits memory has 
been built in Los Alamos National Laboratory. 

Another most unusual computation device has emerged quite recently. L. 
Adleman Q succeeded in solving of the directed Hamiltonian path problem 
solely by manipulating DNA strings. DNA (deoxyribonucleic acid) is perhaps 
the most popular of the molecules in organic chemistry. DNA is responsible for 
keeping and transmitting the genetic information. Adleman’s algorithm solves 
an NP-complete problem. Later papers on molecular computing mostly consider 
the possibility to build universal computers for NP-hard problems. Unfortunately 
(for our purposes) little is done to introduce notions for molecular computation 
on a lower level (finite automata, pushdown machines, etc.) 

Any way, the recent decade has been rich in proposing mathematical notions 
for generalizations of deterministic computation devices. All these devices rely 
on unusually rich ’’built-in” parallelism. On the other hand, we still do not see a 
complete answer on what might be the adequate matematical model to describe 
behavior of human brain. This is a problem not only for physiologists, and first 
of all, this problem is not so much for them. Computer scientists can say rather 
much what type of mechanism could possibly be in our brain to make the choices. 
The models of computation existing now, most probably, are not the adequate 
ones. But which ones are? 

3 Riemann Hypothesis and Complexity of Computation 

Generating functions is an invention puzzling beginners very much. The unu- 
suality of this notion is based mainly on the feeling that you ’’multiply miles by 
miles and get kilograms” . 

We consider an example showing the main idea of generating functions as 
they were introduced by Abraham de Moivre (1667 - 1754). 

Definition 31 Suppose that a random variable takes values 



{ Xi,prob = Pi 
X2,prob = p2 

Xs,prob = Ps 

The function 

i=l 

is called the generating function of this random variable. 

Example. A dice can produce values 1,2, 3,4, 5, 6 with probability respec- 
tively. Calculate the probability of the event ’’the total of results of two dices 
equals a.” 

In our example the generating function is 
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+ f). 

6 

Squaring this, we obtain 



+ —r + — r + — r + — r + —t 



36 



36 



36 



36 



36 



36 
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36 



36 



36 



It remains to find the coefficient of For instance, if a = 5, then the result 
is M- 

The reader can check that the result is indeed correct. On the other hand, 
there is no sense to ask whether the variable t is real, complex or some other. 
This is a formal variable denoting merely the places where the corresponding 
numbers (the coefficients) can be found. 

Generating functions used by de Moivre contained only a finite number of 
terms in these polynomials. Leonhard Euler (1707 - 1783) made the technique of 
generating functions to be a powerful tool to obtain complicated combinatorial 
results. In de Moivre’s example we used multiplication of the generating func- 
tions. Euler’s functions were represented by infinite power series (not merely by 
polynomials) and the operations used included even taking a derivative. 

Next step in development of the generating functions was made by Johann 
Peter Gustav Lejeune Dirichlet (1805 - 1859). He used more complicated infinite 
series as the ’’place-holders” for the coefficients, namely Dirichlet series. 

A Dirichlet series is 



ns) = E 

n— 1 



eXji 



where s can be a real or complex number. Following Euler, we consider the 
case of real values for s. F(s) can be considered the generating function for the 
sequence {«„}. 

However this is a more complicated object rather than the power series, the 
properties of Dirichlet series are rather similar. 

(1) If ^ a converges absolutely for an s, then it converges absolutely 

for every s > sq- 

(2) If omn~^ converges absolutely for a s > sq, then the derivative of 



ns) = E 

n—l 



Oiji 



can be calculated term-wise 



F\s) = 



OO , 

an log n 

n—l 






— s 



(3) If F{s) = 



0 for s > so) then a„ = 0 for all n. 
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(4) Absolutely converging Dirichlet series can be multiplied term-wise. 
The simplest Dirichlet series is the one defining Riemann zeta-function 






corresponding to the sequence = 1. It converges for s > 1. 



1 2 

c(2) = e4 = t 

n—l 



oo . 4 



C(2n) = 



TV* 

n—l 



90 



(2n)! 



n 2n 
TT 



where Bn is the Bernoulli number (L. Euler). 



X 

- 1 

The first Bernoulli numbers are 



oo 



E 

n—O 



BnX^ 

nl 



oB=l, 



B2 

Bi 

Be 

Bs 



1 




30 ’ 



1 2 ’ 

^3 = 0 , 

Be = 0 , 

57 = 0, 
5g = 0. 



Since the odd Bernoulli numbers equal 0, the corresponding values of the C- 
function also equal 0. These are called the trivial roots of the ^-function. Below 
we will see other roots called nontrivial roots discussed. 

Theorem 31 (L. Euler)// s > 1, then 

p 

Dirichlet used this technique to prove that arbitrary non-degenerated arith- 
metical progression contains infinetely many prime numbers. 

Georg Friedrich Bernhard Riemann (1826 - 1866) was a mathematician who 
influenced the 19th century mathematics no less than any other his contempor- 
ary. 

Riemann’s doctoral dissertation was on the functions of a complex variable. 
This is why he along Augustin-Louis Cauchy (1789 - 1857) is considered as the 
father of the modern theory of the functions of a complex variable. 
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On June 10, 1854 B. Riemann delivered his academic lecture ”On hypothe- 
ses underlying geometry” to get employment at Gotingen University. This 
is the talk and this is the paper where Riemann geometry is described for the 
first time. The position he seeked for was not that of a full professor. It was a 
position of a ’’Dozent” corresponding to the position of an Associated Professor 
nowadays. 

In 1859 B. Riemann was elected to become a member of the Berlin Academy 
of Science. According the Rules, the newly elected member was to present a short 
report on his scientific activities. B. Riemann presented a 8 page report ”On the 
number of primes less than a given magnitude”. Here he considered the 
function C introduced by L. Euler (above) as a function of a complex variable. 
Unfortunately, the function 



c(») = E 

n—1 



1 

n® 



is defined not on all the complex plane but for Res > 1 only. 

The complex variable allows new operations over the generating functions, 
among them being Analitical Continuation (allowing every analitical function 
defined on a non-trivial domain to be continued in a unique way to the whole 
complex plane), Fourier transform and its inverse: 

If / G Li and 

OO 

ff(w) = J /(t)e“*dt. 



then the function g{uj) is uniformly continuous and bounded for — oo < w < oo 
with g{uj) ^ 0 at |w| ^ oo. 

If g € Li, then almost everywhere 



m = 



oo 

^ J 5(cc)e-“*do;. 

— OO 



The f function is the generating function for a rather simple sequence 1,1,1,... 
However it can be used to construct generating functions for much more com- 
plicated sequences related to well-known notions in Number Theory. 

n® 

n—1 

where d{n) is the number of divisors of the number n, including 1 and n. 

oo / N, 

C(5)C(5-i) = ^^(s>i) 



where cr(n) is the sum of all the divisors of the number n, including 1 un n. 
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1 

C(s) ^ ^ ^ 



where /i(n) is the Mobius function 

1 , if n = 1; 

(-1)'= , if n = piP 2 ■■■Pk 

/i(n) = , where pi,p 2 ,- ■ ■ , Pk are pairwise } 

distinct prime numbers; 

0 , if n is divided by a perfect square. 

c{s)as-k) = Y,^{s>i). 

n—l 

Here ak{n) is the sum of all the fc-th degrees of the divisors of n. 






C(sj n' 

where A(n) is the von Mangoldt function. 



A{n) = 

Euler had noticed that 



logn , if n = p™; 

0 , if n is not a number p™. 



} 



OO 

z! = y e~^x"dx{n = 1,2,5, .. .). 



This allowed to introduce the T-function 

OO 

r(s+l) = Je-VMs>-l). 

0 

Riemann analytically continues the ^-function 

cw = 



+00 

T(— s+1) / {—xYdx 



27TZ 



— 1 a: 



+ 00 



where the integration contour is the positive real semi-axis. Now the function is 
analytical everywhere but a simple pole at s = 1. 

The reader should notice the genre of Riemann’s paper. This is not a journal 
paper. Rather this is a bureaucratic report to the chiefs. No wonder that it 
is not easy to distinguish between correctly proved statements and plausible 
conjectures. 
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Riemann proves that 

C(s) = n-s + l)(2^)*-i2sin(^)C(l - s). 



By a substitution of the variable and multiplication to Riemann obtains 

an entire function, i.e. a function without singular points 

?w = n^ + i)(s-iK-k(s) 

and proves that 

C(s) = C(1 - s)- 

Riemann proves that all the roots p of ^ are on the strip 0 < Rep < 1 and 
asserts that the number of roots p with 0 < Imp <T is about 

T T T 

^ ^ ~ ^ 

and the relative error is of the magnitude ^ . 

This was proved by von Mangoldt only in 1905. 

Next Riemann asserts that the number of roots with the real part | and 
the imaginary part between 0 and T is about 



T 

2tt 



log 



T 

2n 



T 

2tt 



This is not proven even today. 

Riemann proves that 

OO 

logC(^) _ J dx{Res > 1) 

0 

where J{x) starts at ) for a; = 0 and grows in jumps 1 for prime numbers, in 
jumps ^ for squares of primes, etc. 

By inverse Fourier transform Riemann obtains 



a+200 



= ^ > 1 ). 



Since 



J{x) 



= tt{x) + 



1 

2 



7 t(x 2 ) + 




Riemann finally concludes that 



J{x) 



LiM - mx') - log 2 + I (I > 1) 

P T 
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where the sum ^ is taken over all the non-trivial roots p of Q. 

p 

These formulas link explicitly properties of the ^-function (being a conti- 
nuous object) with the properties of distribution of primes among natural num- 
bers (being discrete objects). This was a most unusual discovery because before 
that every mathematician considered continuous and discrete objects being two 
worlds apart. 

Riemann’s intension was to prove a famous conjecture proposed indepen- 
dently by Carl Friedrich Gauss (1777 - 1855) and Adrien Marie Le Gendre (1752 
- 1833) in the years 1792 or 1793 asserting that the number of primes among the 
first X natural numbers was 



X 




2 



Since B. Riemann died early (at the age of 40), not much progress (and 
even interest) was shown during nearly 30 years. Finally, the Paris Academy of 
Science announced a competition (1890-1892) on filling the gaps in Riemann’s 
proof. The prize was given to Jacques Salomon Hadamard (1865 - 1963) but he 
did not solve the problem yet. He did so but that was later. In 1896 both he 
and Charles Jean de la Vallee-Poussin (1866 -1962) independently proved the 
Gauss-Legendre hypothesis on the number of primes in 1896. The crucial point 
was to prove that there are no roots of C, on the line 1 + it. 

The prime number theorem is important for Number Theory and sometimes 
is cited in papers on Complexity Theory (for instance, in m )■ However for 
our story the most important is the following paragraph in B. Riemann’s paper 
having no formal connection with his topic. 

B. Riemann finds number of the roots of his auxiliary function ^(f) in a 
certain domain is indeed as large as needed for his proof and adds: ’’One finds in 
fact this many real roots within these bounds and it is very likely that all of 
the roots are real. One would of course like to have a rigorous proof of this, 
but I have put aside the search for such a proof after some fleeting vain attempts 
because it is not necessary for the immediate objective of my investigation.” 

This is the famous Riemann Hypothesis. Nowadays it is usually formulated 
as an assertion about the ^-function itself. It is asserted that all the nontrivial 
roots of the ^-function have the imaginary part | . 

As it was described above, the ^-function was introduced by L. Euler. Ho- 
wever impact of B. Riemann on the study of this fuction was so profound that 
nowadays the function’s ’’official” name is Riemann ^-function. The function 
was generalized in various ways (Dirichlet L-functions (1837), Hecke L-functions 
(1917, 1918, 1920), E. Artin L-functions (1931), etc.) Counterparts to the Rie- 
mann Hypothesis also were constructed. Since they are generalizations of Rie- 
mann ^-function, no wonder that the counterparts the Riemann Hypothesis are 
also not yet proved or disproved. 

Riemann Hypothesis have turned out to be not merely a difficult open pro- 
blem but also a deep assertion on fundamental properties of mathematical ob- 
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jects. Computer scientists know that NP-complete problems can be found in all 
areas of mathematics. Some of them seem to be algebraic problems, some other 
problems seem to belong to differential equations. However we know that all 
they are reducible one to another. Hence essentially this is the same problem 
under different disguises. The same situation is with the Riemann Hypothesis. 
Equivalent formulations of this conjecture are found in very many areas of the 
classical mathematics. For instance, F. Roesler m has published a paper with 
the title ’’Riemann hypothesis as an eigenvalue problem” (followed by a series 
of similar nature papers by the same author). The author studies eigenvalues 
A„(l < n < - 1) of the matrix An = (am,n) 2 <m,n<N) , where am.,n = m - 1, 

if m\n and —1 otherwise. It turns out that Riemann hypothesis is equivalent to 

detAN = 0{mN-h+^) 

Roesler’s paper is only one of very many possible examples of this kind. Rie- 
mann Hypothesis has become a kind of a new axiom. A new area of mathematics 
has been started containing results based on the assumption that Riemann Hy- 
pothesis is true. Who knows, this new area can later be proved independent of 
the other axioms (like Continuum Hypothesis or Axiom of Choice have been 
proved independent). 

However Roesler’s example (and so many other examples not mentioned here) 
shows that Theoretical Computer Science can also be a possible area for such 
results dependent on Riemann Hypothesis.. At this moment so-called Extended 
Riemann Hypothesis (the counterpart of the Riemann Hypothesis for Dirichlet 
L-functions) has had much deeper influence on Theoretical Computer Science. 
Nobody knows whether this is just a temporary effect or something more deep. 

Gary Miller m constructed a deterministic algorithm to recognize primality 
of natural numbers which runs in polynomial time and is correct if the Extended 
Riemann Hypothesis is. The algorithm is short and clear. 

Given an odd N. Let N — 1 = 2^ x m. We choose a base b and compute the 
following sequence: 

If either all entries are +1 or some entry before the last is —1, then we say that 
the sequence is of type 1. otherwise the sequence is of type 2. 
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Algorithm. 

(1) For each b less than 2(ln A)^: 

(2) IF the 6-sequence of the number N is of type 2, 
PRINT ’’COMPOSITE” 

(3) ELSE PRINT ’’PRIME”. 



Extended Riemann Hypothesis is used for very many algorithms produced 
for needs of Computer Science |4ISI2 1 12'2i2',^ . 

Now we consider the most popular computation model in the Theory of Com- 
putation, namely, multi-tape Turing machines. We will show that complexity of 
language recognition by these machines is closely related to Number Theory and, 
specifically, to the Extended Riemann Hypothesis. 

A Turing machine is said to be strongly L(n) space-bounded if no compu- 
tation on any input of length n , uses more than L(n) space. It is said to be 
weakly L(n) space-bounded if for every accepted input of length n, at least one 
accepting computation uses no more than L(n) space. DSPACE (L) or NSPACE 
(L) denotes the class of languages accepted by deterministic or nondeterministic 
L{n) space-bounded Turing machines, respectively. 

Turing machines with sublogarithmic space differ sharply from those which 
have logarithmic or greater space. If the function L(n) is not fully space construc- 
tible, it is possible that a language A is weakly L{n) space recognizable while 
A is not. We concentrate on languages A such that A is weakly loglog-space 
recognizable and A is weakly /05-space recognizable. We denote this complexity 
class by DSPACE (log log, log). 

Specifically, we consider the language 



NONSQUARES = {!” | {^3k){n = k^)}. 



The complement of this language PERFECT-SQUARES needs deterministic 
space C(logn) but the problem whether or not the language NONSQUARES 
is weakly /05/05-space recognizable turns out to be related to open problems in 
Number Theory. 

By standard methods one can prove that 

1) the two languages are in DSPACE (log), 

2) NONSQUARES have weak space complexity C(loglogn), 

3) PERFECT_SQUARES have weak space complexity l7(logn). 



Conjecture 1. The language NONSQUARES is weakly recognized by a determi- 
nistic Turing machine in space log log n. 

Hence under this conjecture NONSQUARES is in DSPACE(loglog, log). The 
second conjecture seems to be weaker. 

Conjecture 2. The language NONSQUARES is weakly recognized by a nonde- 
terministic Turing machine in space log log n. 
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However we see below that Conjectures 0 and 0 are equivalent. 

Legendre symbol is defined for integers n, p such that p is an odd prime, 
and n is not a multiple of p. 

_ J +1, if there is x such that x"^ = n (mod p), 

\p J \ —1, if there is no x such that x'^ = n (mod p). 

Jacobi symbol (^) is defined as a generalization of Legendre symbol for 
m = Pi ■ p 2 ■ ■ ■ ■ ■ Ps where pi, ... ,pa are primes (some of them can be equal) : 

/n\ J if(m,n)yfl, 

Let N*{n) be 

J minimal m such that (^) = if such an m exists, 

( 0, otherwise. 

By Theorem 3 in §2, Chapter 5 of m N*{n) = 0 iff n is a perfect square. 
Otherwise N*{n) is an odd prime. 

Conjecture 3. N*{n) = 0(poly logn). 

Theorem 32 Conjectures 00 and 0 are equivalent. 

Proof. (3) ^ (1). The deterministic Turing machine considers all odd integers 
m in the order of their growth and tests whether or not the remainder modulo 
m of the length of the input word can be a remainder modulo n of a perfect 
square. If N*{n) never exceeds poly logn, the space complexity never exceeds 
log logn. 

(1) ^ (2). Obvious. 

(2) ^ (3). Assume that NONSQUARES is recognized by a nondeterministic 

Turing machine in space s(n) = O(loglogn). Only configurations of 

the work tapes are possible which is less than the length of the input word. 
Hence the machine inevitably repeats the configurations. Let K{n) denote the 
l.c.m. of all the positive integers not exceeding consC^'^f If a word w is accepted 
by the nondeterministic machine, and we consider a word w' such that |?n'| = 
|'u;| + Ar(n), then there is a computation accepting w' as well. Hence acceptance is 
determined only by the remainder of the length of the input word modulo K{n). 
Hence K{n) > N*{n). If Conjecture (3) fails, then N*{n) > 0{poly\ogn). Then 
K{n) > 0{polylogn) and 

const^^^"^^ > O(polylogn) 

(s(n))^ > rlogn 
s(n) > \/ r log n 



contradicting Conjecture (2). 
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Please notice that Conjectures (1) and (2) belong to the Complexity Theory 
while Conjecture (3) belongs to Number Theory. 

Many persons consider Number Theory as a highly abstract area removed 
from anything practical. Equivalence between a problem in Complexity Theory 
and a problem in Number Theory not only certifies the real difficulties underlying 
the open problems in Complexity Theory but also shows that Number Theory 
is no more an area remote from Computer Science. 

A. Cobham |2| proved (using the proof of M. Ankeny P|) that Conjecture 
(1) is implied by the Extended Riemann Hypothesis. Hence all our Conjectures 
(l)-(3) are true in the mathematics where the Extended Riemann Hypothesis 
holds. 

David Hilbert (1862-1943) placed Riemann Hypothesis in his famous list of 
the Hilbert Problems, i.e. the mathematical problems for whom he predicted to 
have the greatest impact on the development of mathematics. It seems that the 
mathematicians have agreed that Hilbert has been right in his choice. Any way, 
even without being solved this problem has created a fantastic impact. One can 
only wonder whether or not B. Riemann himself recognized the importance of 
his hypothesis. 

I would like to add one more result on Riemann Hypothesis creating much 
impact. Arnaud Denjoy (1884-1974) equivalently interpreted Riemann Hypothe- 
sis in statistical terms. If the values of Mobius function were random, then they 
would be distributed statistically independent. With probability 1 then 



M{x) 

\fx 



< (In a:) 2 , 



would be true, where 

n<x 

Riemann Hypothesis is equivalent to 

\M{x)\ = 0(x5+'^). 

Whatever the results being obtained in this direction and whatever the per- 
spectives of this direction, please remember that randomness is a notion deeply 
connected to Kolmogorov complexity. 

The year 1941 brought a most unexpected continuation in these efforts of 
many mathematicians. Andre Weil (b. 1906) ^Hlproved Riemann Hypothesis for 
congruence C-functions in elliptic functions over finite fields. This line of research 
was continued by many persons but the strongest results were obtained by P. 
Deligne m- The results by Weil and Deligne unfortunately do not say anything 
about the original problem. These ((-functions are no more generalizations of 
Riemann ((-function. 

It might be expected that these results duly respected as classics of the 20th 
century mathematics are surely very far from Computer Science. No, in our time 
the time distance between the highest and most abstract achievements of the 
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classical mathematics and the applications of results in Theoretical Computer 
Science is brought to a minimum. Michael Ben-Or 0 and Eric Bach have 
used Weil’s theorem for their results. Ben-Or’s result was in an area somewhat 
related to elliptic functions, but Bach used this theorem in a situation where 
the formulation of the problem had nothing in common with Riemann and Weil. 
He was interested in the reason why some random polynomial-time algorithms 
work so well in practice. 
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Abstract. The realisation of electronic commerce process is very diffi- 
cult without appropriate system for electronic payment. Of course it is 
possible to use some of the conventional payment instruments (e.g. pay- 
ment cards, bills, payment orders), but these instruments do not fit with 
requirements for improving the commerce process. The resulting electro- 
nic commerce system would have disadvantages inherited from these con- 
ventional payment instruments. New electronic payment systems should 
be developed to satisfy all requirements of electronic commerce. The 
main idea of most of these systems is to convert conventional money to 
its electronic eqnivalent - electronic money. 



1 Introduction 

According to most definitions the electronic commerce system is a system, that 
performs electronically all (or almost all) activities, connected with conventional 
commerce process. The purpose of converting these activities into electronic form 
is to improve the commerce process, especially: 

— to speed up the commerce process and the turnover of money (using faster 
medium for commerce process activities) 

— to create new types of goods (especially electronic goods or ’’soft goods”, 
e.g. electronic publications, electronic services, multimedia products) 

— to make the commerce process more convenient for the customer 

— to find new types of commerce activities (e.g. information retrieval services) 

— to find new types of customers and to globalise the customer area 

The growth of electronic commerce systems causes that requirements for 
suitable payment systems are quite urgent and important. A number of payment 
systems is proposed and some of them are implemented. But we are still not 
satisfied with the quality of existent payment systems and the evolution of these 
systems is still far from required status. 

The problematics of electronic money is tightly coupled with the problematics 
of information system security and mainly with the problematics of cryptogra- 
phy. The design of functional electronic payment system is not so difficult. The 
design of functional and secure payment system is quite difficult and security 
mechanisms are usually the most important part of the payment system. 
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2 Electronic Commerce and Electronic Money 

The term ’’electronic money” (according to [E]) has been used in different set- 
tings to describe a wide variety of payment systems and technologies. ’’Stored- 
value” products are generally prepaid payment instruments in which a record 
of funds owned by or available to the consumer is stored on an electronic de- 
vice in the consumer’s possession, and the amount of stored ’’value” is increased 
or decreased, as appropriate, whenever the consumer uses the device to make a 
purchase or other transaction. By contrast, ’’access” products are those typically 
involving a standard personal computer, together with appropriate software, that 
allow a consumer to access conventional payment and banking products and 
services, such as credit cards or electronic funds transfers, through computer 
networks such as the Internet or through other telecommunications links. 

3 Basic Principles of Electronic Money 

3.1 Properties of Electronic Money 

When implementing an electronic money a big effort has been made to make an 
electronic money as close as possible to real, physical money. Okamoto and Ohta 
present in uni following six properties of an ideal electronic payment system: 

1. Independence. The security of electronic money does not depend on a special 
physical conditions. No special hardware is necessary and money can be sent 
over the network. 

2. Security. Electronic money cannot be copied, modified, or double-spent. 

3. Privacy, anonymity and non-traceability. Privacy of user is protected. No- 
body can deduce the link between user and his payment. The customer may 
perform operations anonymously. 

4. Off-line payment. The protocol for electronic payment between customer and 
merchant can be performed off-line. No direct link to third party (e.g. bank) 
is necessary. 

5. Transitivity. The electronic money can be transferred to any other user. 

6. Divisibility. The electronic coin C can be divided to any number of other 
coins. Any of these coins can have any value, smaller than C, and the sum 
of values of these coins is equal to the C. 

Note, that these properties are properties of an ideal electronic payment sy- 
stem. No currently working electronic payment system meets all these properties 
together. 

3.2 Implementation of Electronic Money 

This section provides a general overview of the electronic payment systems pro- 
ducts which we are interested in. Figure 1 illustrates the general structural model 
common to most electronic money systems, including participants and their in- 
teractions. 
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Electronic payment system contains following parties: 

debit operation 




Client (customer) - party that gets electronic money from client bank (issuer) 
and pays to the merchant. 

Merchant - party that gets electronic money from client and send these mo- 
ney (in the form of payment transactions) to the merchant bank (acquirer). 
Acquirer (usually the bank of merchant) - party that gets the transactions 
(i.e. electronic money) from its merchants and clears these payment transac- 
tions with appropriate issuer (client bank). 

Issuer (usually the client bank) - party that gives the electronic money to 
its clients and later receives these money from the acquirer. 

The actions in this model are: 

Credit (loading) means transferring the monetary value from the issuer to 
the payment instrument (e.g. electronic purse) of client. 

Debit (purchase, payment) means transferring the monetary value from pay- 
ment instrument of client to the payment instrument of merchant (that is 
usually payment terminal) . In the terminal is then created payment transac- 
tion, that contains the electronic money and other payment details. 
Transaction collecting means transferring the payment transactions from the 
merchant to the acquirer. 

Payment clearing means clearing of payment request between acquirer and 



issuer. 
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From the security point of view the most sensitive operations are credit and 
debit. The main threats are concentrated in these two operations. These threats 
include using of fake payment instrument, modifying communications of payment 
instrument, and illegal crediting. 

Other two operations are less sensitive and the probability of security incident 
during these operations is much smaller. 

Physical devices, such as smart cards or personal computers, are held by cli- 
ents and by merchants. Merchants interact with clients and with their acquiring 
bank or other collection point, such as a third-party payment processor. Issuers 
receive funds in exchange for prepaid balances distributed to clients and ma- 
nage the ’’float” in the system that provides financial backing for the ’’value” 
issued to consumers. In some cases, other intermediaries, such as banks, retai- 
lers or service providers, distribute stored- value devices and balances directly to 
consumers. The system may include a central clearing house or system operator. 



4 Taxonomy of Electronic Payment Systems 



Generally there are two kinds of electronic payment systems - on-line systems 
and off-line systems. On-line systems require direct communication connection 
with the electronic money issuer (usually the bank) during every transaction 
(credit or debit). Off-line systems allow to perform payment transaction without 
such on-line connection with the issuer. 

From privacy point of view the payment systems are divided to identifiable 
and anonymous. Identifiable payment system allows the issuer of electronic mo- 
ney to identify the participants of every transaction and gives him the possibility 
to trace the path of electronic money. Anonymous payment system preserves one 
of the property of real metal coins - the anonymity and untraceability. The issuer 
has no possibility to follow the path of electronic money. The anonymity of the 
payment system is quite desirable property of the payment system. 

We also distinguish whether the payment system uses the intelligent token - 
smart card (sometimes called also electronic purse or electronic wallet) or it uses 
only non-intelligent payment instrument. 

Next criterion is whether the payment instrument (magnetic card, smart 
card, personal digital assistant, personal computer) carries in itself an electronic 
monetary value (systems with electronic money) or it does not carries any value 
(systems without electronic money). 

If the system uses the electronic money, we consider the implementation of 
value in the payment instrument. The value can be implemented using a counter 
(counter based systems) or using electronic coins. 

According to the cryptographic mechanisms that are used, the electronic 
payment systems are based on secret-key cryptography or on public-key crypto- 
graphy. 
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The taxonomy of electronic payment systems is shown at the following figure: 

Electronic payment systems (EPS) 

— EPS without electronic money 

— electronic banking (e.g. any home banking) 

— magnetic payment card (e.g. credit card) 

— payment smartcard (e.g. VISA Easy Entry) 

— network system without electronic money (e.g. SET) 

— EPS with counters 

— prepaid smartcard card (e.g. telephone card) 

— electronic purse with electronic cheques 

— EPS with electronic coins 

— electronic purse with electronic coins 

— network payment system with electronic coins 



Fig. 2. Taxonomy of electronic payment systems 



4.1 Electronic Payment Systems without Electronic Money 

The most characteristic property of electronic payment systems without electro- 
nic money is that the payment instrument (e.g. magnetic payment card, smart 
card, or personal computer) does not contain any electronic money. The pay- 
ment instrument performs only identification and authentication of the client 
and sometimes is used for cryptographic securing of the messages or for non- 
repudiation of the client. The messages, exchanged between the client and other 
party also do not contain any electronic money - they contain orders to transfer 
money from account to account. Payment systems without electronic money are 
typically identifiable and on-line payment systems. Typical examples of paym- 
ent systems without electronic money are electronic banking systems, magnetic 
payment cards, payment smartcards, and computer network payment systems. 

Electronic Banking System. Electronic banking systems can have many dif- 
ferent names - home-banking, internet banking. Telebanking, etc. Electronic 
banking system performs exchange of banking information between bank and 
client using a personal computer, modem, and telephone line. System usually 
allows common passive operations with client accounts (e.g. examine account 
balance and history of account) and also some active operations (e.g. sending 
payment orders). It is clear from the nature of the electronic banking system 
that this system does not contain any electronic money. 



Magnetic Payment Card. Magnetic payment card is used for withdrawal 
of the cash from ATM (Automatic Teller Machines) or for performing cashless 
payments. The card itself does not contain any electronic money. All relevant 
information are located in bank central computer and payment card (together 
with PIN) is used only for identification and authentication of client. 
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Payment Smartcard. Payment smartcard is used instead of magnetic pay- 
ment card. The smartcard itself does not contain any electronic money and all 
information are stored in bank computer. Payment smartcard is direct replace- 
ment of magnetic payment card and has the same functionality as magnetic card. 
Its advantage is higher level of security - it is harder to copy it, it can locally 
verify the PIN code and it can effectively limit the number of unsuccessful PIN 
attempts. 



Payment Systems for Computer Networks. Payment systems for computer 
networks are both without electronic money and with electronic money. Because 
of implementation complexity of electronic money the most of these systems are 
without electronic money. The main characteristics of these systems are that the 
payment instrument is personal computer and the communication is performed 
over the internet. It is desirable that no additional hardware is required on the 
client side. Because of relatively big computational power on the client side the 
cryptographic mechanisms are usually based on public key cryptography. 

One of the most known payment system for computer networks is SET (ini). 
In the second half of 1995 two separate draft specifications for making secure 
payments over insecure networks such as the Internet were published: the Secure 
Transaction Technology (STT) sponsored by Visa International and Microsoft, 
and the Secure Electronic Payment Protocol (SEPP) sponsored by MasterCard 
International. However, in early 1996 Visa International and MasterCard Inter- 
national published for comment a joint draft specification called Secure Elec- 
tronic Transactions (SET). SET is aimed at transactions made using existing 
payment products, such as credit and debit cards, rather than electronic money 
products. The specification identifies five parties to any transaction: the cardhol- 
der, issuer, merchant, acquirer and payment gateway. The cardholder initiates 
the purchase across the network from his personal computer. Use is made of 
’’trusted software” and authentication information on the PC. 

SET specifies the use of message encryption, digital signatures and cryptogra- 
phic certificates to provide confidentiality of information, integrity of payment 
data and authentication of cardholders and merchants. SET specifies RSA-based 
cryptography using 768, 1,024 or 2,048 bit keys and a hierarchy of certification 
authorities. 



4.2 Electronic Payment Systems with Counters 

The simplest implementation of electronic payment system with electronic mo- 
ney is to represent the money amount, carried by the client, as a value of counter, 
stored in the secure hardware token that is used as a payment instrument. When 
the payment instrument is credited, the counter is incremented by credited value. 
When the debit operation is performed, the counter is decreased by value paid. 
The system is rather simple and also the messages exchanged between parties 
are simple and independent on the exchanged monetary value. The security of 
such system relies on the security of payment instrument and on its resistance 
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against unauthorised tampering. When the payment instrument is ’’broken” then 
the attacker can create an arbitrary amount of Active money. The most common 
variants of counter-based payment system are prepaid card and electronic purse. 

Prepaid Card. Prepaid cards (according to m) have developed first as a 
single-purpose payment instrument for which the card issuer and the merchant 
have been one and the same party (e.g. telephone cards or parking cards). Such 
cards have not raised central bank concerns because the value embedded in them 
(i.e. the value of counter) did not have a wide range of uses and, therefore, did 
not have the characteristics of money. Prepaid card is usually implemented by 
smart card, but in case of small amounts the magnetic card can be also used. 
The prepaid card is a typical anonymous off-line payment system. 

Electronic Purse with Electronic Cheques. Drawing on the experience of 
prepaid cards a new payment instrument is under development in many coun- 
tries: the multi-purpose prepaid card, also known as the ’’electronic purse”. El- 
ectronic purses differ from other cashless payment instruments in that they are 
supplied in advance with generally accepted purchasing power. They can be 
loaded at bank counters, through Automated Teller Machines or through speci- 
fically equipped telephones, against a debit entry in a bank account, or against 
banknotes and coins. The embedded purchasing power is drawn down at the 
point of sale by an electronic device that can suitably adjust the information on 
the card. 

Inside of electronic purse is again a counter, that directly represents mone- 
tary value. When the purse is debited, the counter decreases its value and purse 
issues an electronic message (electronic cheque) with debited value. The electro- 
nic cheque is cryptographically secured. The merchant stores received electronic 
cheques in its payment instrument (usually called point of sale or payment termi- 
nal) in the form of payment transactions. The transactions are then submitted 
by merchant to bank and consequentially the merchant’s account is credited. 
The electronic purse is a typical identifiable off-line payment system. 

4.3 Electronic Payment Systems with Electronic Coins 

The main characteristics of electronic payment systems with electronic coins is 
that the payment instrument contains a number of digitally signed pieces of data 
- electronic coins. The value stored in the payment instrument is equal to the 
sum of values of individual electronic coins. 

When the payment instrument is credited, the issuer creates the number of 
required electronic coins and sends it to the client payment instrument. The 
payment instrument only stores the electronic coins - it is used as storage media 
for electronic coins and also as a ’’guard” against double spending of the same 
electronic coin. In the moment of payment (i.e. debit operation) the payment 
instrument sends to the receiving party (usually to the merchant) an appropriate 
number of electronic coins. The electronic coins retain the most properties of real 
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coins and payment system with electronic coins is usually an anonymous off-line 
system with transitivity. 

From the security point of view the coin-based systems are better that 
counter-based systems. When the electronic purse with counter is ’’broken” (i.e. 
attacker knows the secrets stored in the purse), the attacker is able to create 
a big amount of fake electronic money, and the probability of identifying the 
attacker is usually small. When the electronic purse with coins is ’’broken”, the 
attacker is still not able to generate any new electronic coin - he is able only to 
spend existing electronic coins twice or more times. 

From the technical point of view the implementation of coin-based payment 
system is quite difficult and we are still not satisfied with current implementa- 
tions. The main problem is already mentioned double-spending. The problem 
of double-spending is natural problem of electronic signature based electronic 
coins. The electronic coin has a lot of properties equivalent to real coins. But 
there is one property, that is inherently different - the ability to distinguish bet- 
ween an original and a copy. When the real coin is copied, it is usually easy to 
distinguish between original and fake copy. When the electronic coin is copied, 
the original and the copy are exactly the same. So the electronic coin could be 
spent many times, that is highly undesirable. There is no common and simple 
solution to this problem. One possible solution is before accepting of the elec- 
tronic coin check in central database, whether this coin was already spent - this 
solution enforces on-line connection. Next possible solution is to use an ’’elec- 
tronic guard” that is represented usually by the smartcard, that ensures that 
the coin is spent exactly once. This solution does not allow software-only imple- 
mentation of payment instrument. There are also some other possibilities, but 
these have also disadvantages. The next problem of electronic coins is divisibi- 
lity. Simple implementations do not solve divisibility and it could be sometimes 
difficult to combine available coins into desired amount. There are also some 
cryptographically-based solutions, but these solutions are usually difficult to im- 
plement and expensive. 

The result is that in present the majority of payment systems with electronic 
money are counter-based payment systems and only few really working systems 
are coin-based. But coin-based systems have many advantages and in future we 
can expect that implementations of coin-based systems will be quite common. 



5 The Role of Cryptography 



The role of cryptography is very important in the design of electronic payment 
systems. The application of cryptographic mechanisms can help achieve objec- 
tives such as confidentiality, data integrity, authentication, and non-repudiation 
(see (E3). 

The cryptographic mechanisms used in electronic payment systems include 
secret key encryption/decryption, one-way hash functions, challenge-response 
cryptographic protocols, digital signatures and key management protocols. 
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The cryptographic principles and building blocks described above are used to 
achieve security functions such as confidentiality, data integrity, authentication, 
and non-repudiation. Confidentiality is typically achieved by using triple-DES 
as the encryption method. Although it can also be done by applying asymme- 
tric algorithms, owing to performance and price considerations the symmetric 
algorithms are generally preferred. 

DES is also referred to as single-DES, to distinguish it from triple-DES. 
Triple-DES encryption consists of three consecutive operations (encryption; de- 
cryption; encryption) in which two DES keys are used (or a double-length DES 
key). Triple-DES has been developed in response to the increasing processing 
capabilities of computers and ensures that an exhaustive key search would still 
demand a considerable amount of resources. 

Data integrity and authentication (including non-repudiation) are achieved 
by using DES, triple-DES and public key algorithms such as RSA, and by app- 
lying well-known hashing and MAC algorithms, such as MD-5, SHA-1 and RSA. 

6 The Role of Tamper Resistant Hardware 

The concept of tamper resistant hardware is tightly coupled with the concept of 
reference monitor. The reference monitor was defined in Q and was standardised 
in 0. The reference monitor concept was found to be an essential element of 
any system that would provide multilevel secure computing facilities and con- 
trols. Reference monitor is also a heart of the most of cryptographic modules 
using secret-key cryptography. An usual implementation of reference monitor 
is a reference validation mechanism, so we will define the reference monitor in 
this implementation (see |^). Reference validation mechanism is defined as ”an 
implementation of the reference monitor concept that validates each reference 
to data or programs by any user (program) against a list of authorised types 
of reference for that user.” Three design requirements that must be met by a 
reference validation mechanism are: 

1. The reference validation mechanism must be tamper proof. 

2. The reference validation mechanism must always be invoked. 

3. The reference validation mechanism must be small enough to be subject to 
analysis and tests, the completeness of which can be assured. 

The most common implementation of this concept in electronic payment 
protocols is done by using the smart card as a payment instrument. The smart- 
card has such physical and logical properties that it complies to the three above 
conditions. The conditions are then met in following ways: 

1. The reference validation mechanism is tamper proof because of physical pro- 
perties of the used smartcard, that is designed as secure hardware, that is 
resistant against physical, electrical, electro-magnetic, and chemical tampe- 
ring. 
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2. The reference validation mechanism is always invoked because of communi- 
cation protocol, that is the only way to communicate with the smartcard. 

3. The reference validation mechanism is small enough to be subject to analysis 
and tests, because of simplicity and standardisation of the communication 
protocol that is used. 

For the long time the tamper resistance of smartcards and security processors 
was accepted without discussion. It was known, that large companies, like Intel 
or IBM, can successfully reverse-engineer complex chips, but everybody thought 
that this kind of attack is far beyond abilities of general attackers. The problem 
of evaluating the level of tamper resistance offered by a given product has been 
neglected by the security research community. It was discovered in the past that 
attacks on tamper resistance are possible also by small companies and even by 
individuals (see 0, IHl)- The tamper resistance of smartcards and security pro- 
cessors has to be now closely examined product by product to discover possible 
vulnerabilities. 



7 Example System of Smart Card Based Electronic Purse 



Recently a new payment instrument has emerged: the multipurpose prepaid card 
or ’’electronic purse”. It is a plastic card which contains real purchasing power, 
for which the customer has paid in advance. Although developments in the field 
of electronic purses are only at an early stage, the possibility of proliferation of 
such cards is a real one. In the future, if electronic purses were used in a great 
number of retail outlets, they would become a direct competitor not only to 
cashless payment instruments already in existence, but also to notes and coins 
issued by central banks and national authorities. 

In following sections we will describe the proposed payment system, that is 
developed on the Department of Computer Science and Engineering, TU Brno. 
This payment system is developed in the framework of development the student 
smart card, that except other functions should have a property of electronic 
purse for closed payment system. The proposed system according the previously 
defined taxonomy is an electronic purse with electronic cheques, i.e. the system 
with counters. 

The payment instrument contains inside a counter. The value of the counter 
is equivalent to the monetary value, that is stored in the payment instrument. 
The value of the counter can be changed using two operations - debit and credit. 
These two operations are equivalent to the two commands of payment instrument 
and have following semantics: 

— Operation CREDIT (VALUE) increments the COUNTER by value VA- 
LUE. Input parameter of this operation is a credit value VALUE. The ope- 
ration has no output parameters - returned is only the status that indicates 
successful performing the operation. 
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— Operation DEBIT (VALUE) checks whether value of the COUNTER is 
greater or equal than VALUE. If this is not the case, operation immedia- 
tely quits with status that indicates not successful performing of operation. 
Otherwise the operation decrements the COUNTER by value VALUE. Input 
parameter of this operation is a credit value VALUE. The operation has no 
output parameters - returned is only the status that indicates successful or 
unsuccessful performing the operation. 



7.1 Security Requirements 

From the functionality point of view the operations CREDIT and DEBIT are 
correct. From the security point of view are operations with such semantics not 
suitable, because they do not prevent against following attacks: 

1. Tampering with the payment instrument. The value of counter can be modi- 
fied (of course, increased) not only by performing the credit operation, but 
also using direct logical or physical manipulation with the payment instru- 
ment. These manipulations include patching in the case of software payment 
instrument or electrical tampering (e.g. using a microprobes injecting elec- 
trical signals) in the case of hardware payment instrument. 

2. Using of fake payment instrument. This instrument emulates the behaviour 
of debit operation and gives to the client infinite amount of money without 
any crediting. 

3. Modifying the communication between payment instrument and payment 
terminal. The communication could be modified in such way, that negative 
status code from unsuccessful debit operation is changed to positive status, 
although the payment instrument does not contain enough money to perform 
payment . 

4. Illegal crediting of genuine payment instrument. This attack can be done 
simply by performing the credit command on the payment instrument. 

In the following text we would like to describe the security concept of these 
two most sensitive operations - debit and credit - that prevents above attacks. 
The first attack is prevented by using the tamper resistant hardware and next 
three attacks are prevented by using cryptographic protocols. 

7.2 Smart Card Used 

The used smartcard is AT card. AT card is an authentication smartcard, adapted 
to cryptographic and prepaid card applications. It incorporates the ISO 7816-4 
standard commands and return codes. AT card operating system must fulfil two 
main functions: 

1. Be a general purpose operating system for smart card applications. 

2. Provide security processing for authentication and prepaid applications 




118 



P. Hanacek 



For the card user, the applications of AT card are many. Personal data such 
as medical history could be stored. It is possible to support financial applications 
such as EFTPOS, or to implement electronic wallet/cheque book functions all 
easily and securely from within the AT card system. In fact any application 
requiring the storage and retrieval of small to medium volumes of data with 
restricted or general access is possible from within the AT card structure. Use of 
recognised international standards where applicable makes the system acceptable 
across national boundaries making both the cards and the application developed 
for them internationally acceptable. 

The communications protocol conforms to ISO/IEC 7816-3 in order to make 
the card readable from general purpose reading equipment. To achieve reliable 
and secure data transfer data encryption is based on the ANSI DES (X3.92- 
1981) algorithms and ANSI 3-DES algorithm. Message authentication is based 
upon ANSI X9. 9-1982. The Table 1 summarises the AT card commands. 

Table 1. AT card commands 



VERIFY 


Compares a card holder verification value (PIN) 
against a reference value. 


SETPIN 


Changes the card holder verification value (PIN). 


READ BINARY 


Reads data from a data file. 


UPDATE BINARY 


Updates data in a data file. 


GET CHALLENGE 


Generates an eight byte challenge and provides it 
to the external world. 


PUT RANDOM 


Initiates the computation of the session key, based 
on supplied random number sent from the reader. 


INTERNAL AUTHENTICATE 


Allows an external application to verify whether 
the card or an application on the card is authentic. 


EXTERNAL AUTHENTICATE 


Authentication of the external world based on a 
previously generated random number and a secret 
key. 


DECREASE (DEBIT) 


Decreases the value in a purse file by a specified 
amount and returns the new value. 


INCREASE (CREDIT) 


Increases the value in a purse hie by a specihed 
amount and returns the new value. 



7.3 Cryptographic Protocol for Credit and Debit Operations 

Operations credit and debit are cryptographically secured using so called MAC 
(Message Authentication Code). MAC is a way how to ensure authentication 
(i.e. proof of origin) of the secured message and the integrity (i.e. prevention 
against modification) of the message. 

M ACk{M) is a fixed length value, usually 32 or 64 bit long, that is the 
function of message M and secret cryptographic key K (see Fig. 3). This value is 
computed by the creator (sender) of the message and is appended to the message. 
The recipient of the message which knows the same secret key K as creator can 
compute independently the MAC value according to received message M and 
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Fig. 3. Cryptographic mechanism MAC 



his key K and then compare the value of received MAC and computed MAC. If 
both values are equal, the recipient can be sure, that: 

1. The message was created by the creator that knows the secret key K, and 

2. The message was not changed during the transmission. 

Unfortunately MAC is not enough to protect the messages that contain 
the credit and debit commands because of replay attack. Replay attack allows 
attacker to capture a legal message with its MAC and send it later to the recipi- 
ent. It is clear that e.g. replay of message with credit command means illegally 
increasing the value of payment instrument which is highly undesirable. 

The solution to replay attack is to make every MAC unique by parametrizing 
it by the random value. The MAC value is computed over the message M and 
the random value Rnd that is unique for every command. Thus we need two new 
operations of payment instrument: 

ASK RANDOM - (in AT card called GET CHALLENGE) that asks the paym- 
ent instrument for random value that will be used by subsequent command 
PUT RANDOM - that gives to the payment instrument the random value, crea- 
ted by the outside world that will be used by subsequent command 

Now we can define the cryptographic requirements for the credit and debit 
operations: 

Credit operation increases the value of counter, so the illegal performing of 
this command is highly undesirable and is against security policy. The command 
itself must be secured by the MAC to prevent illegal credits of payment instru- 
ment. Because payment instrument must prevent the outside world against fake 
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credits, the random value for MAC must be generated by the payment instru- 
ment and retrieved by the ASK RANDOM command (in the opposite case the 
’’fake” outside world will generate the same random value as previously and thus 
can perform replay attack). 

Because this operation always succeeds (of course only when it is performed 
legally), it is not necessary to cryptographically secure the response (returned 
status) of this command. 

Debit operation decreases the value of counter, so the illegal performing of 
this command is not dangerous and need not be prevented. An attacker cannot 
gain anything by performing of this operation. So the command itself need not 
be secured by the MAC. 

The status of operation indicates the merchant whether the client is solvent, 
so the response message must be cryptographically secured against modification 
by MAC value. Correct MAC value indicates that the payment instrument is 
genuine (fake payment instrument does not know the secret key K and it is not 
able to compute correct MAC value). Correct MAC value also indicates that 
the response message was not modified, i.e. that the client had on its payment 
instrument enough money to pay and that his counter value was decreased. 
Because the outside world (in this case the merchant) must be assured that 
the message is authentic, the random value for MAC must be generated by the 
outside world and entered by the PUT RANDOM command. 

The resulting protocol for the credit and debit operations is shown on the 
Figure 4. 



debit operation credit operation 
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Fig. 4. Credit and debit commands 



8 Conclusion 



Electronic money products have the potential to provide important benefits to 
payment systems if implemented with appropriate security. These systems can 
not be made fully secure against all types of attack. Determining the appropriate 
level of security for a particular system should involve consideration of the ma- 
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gnitude of potential risks, the cost of implementing varying levels of security, the 
impact on the functionality of the product and the implications for privacy. 
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Abstract. We study three methods based on linear programming and 
generalizations that are often applied to approximate combinatorial opti- 
mization problems. We start by describing an approximate method based 
on linear programming; as an example we consider scheduling of jobs on 
unrelated machines with costs. The second method presented is based on 
semidefinite programming; we show how to obtain a reasonable solution 
for the maximum cut problem. Finally, we analyze the conditional pro- 
babilities method in connection with randomized rounding for routing, 
packing and covering integer linear programming problems. 

1 Introduction 

Several approximation algorithms for combinatorial problems are somewhat re- 
lated to linear programming or generalizations of linear programming. Once an 
optimization problem is formulated as an integer mathematical programming 
problem, the relaxation over the reals can be used to develop approximation 
algorithms in one of the following way: 

1. Primal/dual algorithms: the algorithm finds a feasible solution for the pro- 
blem of interest and a feasible solution for the dual of its relaxation; the ratio 
between the costs of the two solutions is at most some constant r. Then the 
solution is r-approximate. 

2. Rounding: an optimum (or near-optimum) solution is found for the relaxa- 
tion, and it is rounded to yield an integer solution. The ratio between the 
cost of the rounded solution and the fractional one is at most r. Then the 
rounded solution is r-approximate. 

In the first case, the linear programming formulation is used just as a concep- 
tual tool to prove the approximation bound. In the second case, the relaxation 
is actually solved. Rounding a fractional solution is, in general, a difficult task. 
Randomization is a primary tool. The algorithms we shall consider in this paper 
use linear programming, semidefinite programming and/or randomized rounding 
to find a solution. 
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This paper is organized as follows. In the next section, we study linear pro- 
gramming by showing how to derive an approximation algorithm for the par- 
ticular case of scheduling jobs on unrelated machines with costs. We continue 
by describing a method based on semidefinite programming. As an example we 
show a randomized approximation algorithm for the maximum cut problems that 
randomly rounds the solution of a nonlinear programming relaxation. In the last 
section, we study more deeply the randomized rounding presented in the previous 
sections. Random rounding has been used to develop approximation algorithms 
for the maximum satisfiability problem and the constraint satisfaction problem 
among others. By the use of pessimistic estimators a de-randomized solution can 
be found. 

2 Linear Programming 

In this section we describe an approximation method based on linear program- 
ming. As an example we consider scheduling of jobs on unrelated machines with 
costs. This scheduling problem can be described as an integer program. We 
describe an approximation algorithm given by Shmoys and Tardos |2f)j . The al- 
gorithm consists of a linear programming LP relaxation followed by rounding 
the fractional solution. The rounding step is based on a minimum cost matching 
algorithm in a bipartite graph. Several approximation algorithms are based on 
linear programming; for a survey we refer to the book edited by Hochbaum [E|. 

A linear program LP has the following form: 



and can be solved in time polynomial in the size of the input unci!- 

In the following, we describe the scheduling problem with unrelated machines. 
There are n independent jobs and m parallel machines where each job has to 
be processed by exactly one machine. Job j takes pij time units and produces 
cost Cij when job j is executed on machine i, for i = 1, . . . , m and j = 1, . . . , n. 
A schedule can be described by a mapping / that assigns a machine f{j) G 
{!,..., m} to each job j. The cost of a schedule / is given by ^fU)j 

the makespan of a schedule is the maximum finishing time: 



The approximation algorithm solves a bicriteria problem: optimizing the ma- 
kespan and the cost of a schedule together. Lenstra, Shmoys and Tardos HZl have 
given a 2-approximation algorithm for the single criterion problem of minimizing 
the makespan. Furthermore, they have proved an in-approximability result for 
the single criterion problem: for any e < ^ there exists no polynomial time (l-ke) 
approximation, unless P = NP. Suppose that there is a schedule with total cost 




i = 1, . . . , m 
j = l,...,n 



7TiaXl<i<m ^ ^ Pij ■ 
j\fU)=i 
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C and makespan T. The goal is an approximation algorithm that generates a 
schedule with makespan at most 2T and cost at most C. To do this we describe 
the scheduling problem as a integer linear program: 



^j=lPij^ij ^ T 
Xij € {0, 1} 

Xii — 0 



j = l,...,n 

i = 1, . . . , m 

i = 1,. . . ,m; j = 1,.. .,n 
if pij > T 



Solving the integer linear program is a NP-complete problem. Therefore, we 
use a relaxation of the integer linear program. If we relax the binary constraints 
Xij € {0, 1} and require only that Xij > 0 for all i,j, we get a linear program 
called the LP relaxation of the integer linear program. We notice that the cost 
value of an optimum LP solution is a lower bound on the integral optimum cost 
value. Given an optimum solution x = (xij) of the linear program, we construct 
an integral assignment with total cost at most C and makespan at most 2T. 
Clearly, the cost of any optimum fractional solution is bounded by C. To get 
an integral solution we build a bipartite graph B{x) = {V,W,E) with weights 
x'{v,w) on the edges (v,w) G E. 

The bipartite graph consists of job nodes W = {wj\j = l,...,n} and of 
machine nodes V = {vis\i = 1, . . . , m, s = 1, . . . , ki\ where ki = 
nodes vn, Vi2, ■ ■ ■ , Vik^ correspond to machine i. In what follows, we construct the 
edges incident to nodes corresponding to machine i. We may assume that the 
jobs (for machine i) are sorted in non-increasing processing time order: 



Pil>Pi2>---> Pin- 

If E^=i — I then we have only one node vn for machine i. In this case 
we use an edge {vn,Wj) G E with weight x'{vn,Wj) = Xij for each job j with 
Xij > 0. Otherwise, let ji be the minimum index with EjLi ^ 1- Then E 
contains the edges {vn, Wj) for j = 1, . . . , ji — 1 for which Xij > 0 with weights 
x'{vii,Wj) = Xij and one additional edge (vu,Wjj^) with weight x'(vu,Wjj^) = 
1 — x'(vii,Wj). Using this definition, the sum of the weights among edges 

incident to vu is exactly 1. If J2f=i > I then a fraction of the value is 
unassigned. We set — x'{vn,Wj-^)] this is the remaining fraction of job 

ji for machine i. If EyLi ^ij = 1 then we have x'^y = 0. 

Then, we proceed with the jobs j > ji and values xTj^,Xyy+i, . . . ,Xin, and 
construct edges incident to node Vi2 in the same way as above. This procedure is 
iterated ki times for the nodes vn,Vi2, ■ ■ ■ , Vik^ ■ The sum of the weights of edges 
incident to Vis (s < ki) is exactly 1 and the sum of the weights of edges incident 
to Viki is at most 1. We give an example of this construction with n = 8 jobs. We 
suppose that the jobs are sorted according to non-increasing processing times on 
machine 1, and suppose that xn = |, X13 = X15 = xiq = f, xis = 1 and that 
X12 = Xi4 = Xi7 = 0. The number of nodes for machine 1 is [§ + § + =4- The 

constructed edges with corresponding weights for the first machine are given in 
Table d 
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Table 1. The constructed edges with weights for machine 1 





Wl 


W2. 


W5 


Wq 


Ws 


Uii 


!2 

3 


3 








V 12 




6 


7 . 


3 




Vl3 








1 


b 


Vn 










i 

fi 



For each machine node Vis we define two values 

= max{pij\{vis,Wj) e E}, 

P™” = rnin{p,j\{y,s,Wj) G E}. 

A non-negative weight function 2 ; : if — > [0, 1] is a fractional matching if for each 
node u G y U VF (of the bipartite graph) the sum < 1- A 

node is exactly matched if the sum is exactly 1. We have an integral matching if 
the weights z{u,v) G {0, 1}. The weight function x' constructed for the optimal 
solution ir is a fractional matching in B{x) of cost at most C. In this fractional 
matching, each node Wj G W and each node Vis for i = 1, . . . , m and s < ki is 
exactly matched. Since the jobs are sorted according to the processing times on 
machine i, we have for each s = 1, . . . ,ki — 1. 

The algorithm to construct a feasible schedule works as follows: 

Algorithm: 

Step 1: Compute an optimum solution x of the linear program. 

Step 2: Build the bipartite graph B(x) with weight function x' . 

Step 3: Compute a minimum cost (integer) matching M that matches exacly all 
job nodes in B{x). 

Step 4: For each edge {vis,Wj) G M schedule job j on machine i. 

Theorem 1. The schedule generated by the algorithm above has makespan at 
most 2T and total cost at most C . 

Proof. We know already that x' is a fractional matching in B{x) of cost at most 
C. Then, there exists an integral matching M in B{x) of cost at most C (see 
0). This implies that the constructed schedule has cost at most C. We prove 
now that the matching M generates a schedule in Step 4 with makespan at most 
2T. To show this consider machine i and the ki nodes corresponding to machine 
i in B{x). Since for each machine node there is most one incident edge in M, 
the total processing time on machine i can be bounded by Pis‘^^ ■ 

Using the inequality > p^+'i for s = 1, . . . ,ki — 1 and pYi°‘^ < T we 
obtain: 

^max ^max i max ^ .^max _i_ min 

s=lPis ~ Pil ' 2-! s=2 Pis — Pil ' 2-i s=l Pis 

< pT"" + 

<T+ X]”=i PijXij < 2T. 

□ 
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We suppose that n > m. The linear program in our example can be described 
as a fractional packing problem of the form described in Encni: 



< T 

Xij > 0 
x-ij — 0 



i = 1, . . . , m 
j = 

if pij < T 
if Pij > T 



The m machine constraints and the cost constraint are the packing con- 
straints, and the remaining constraints correspond to a polytope P (the product 
of n simplices) . The techniques in pnuni can be used to generate an approxima- 
tive solution of the packing program where the load and cost constraints are rela- 
xed by a factor of 1-1- e. The running time of the algorithm in m is 0{mn^ log m), 
and the running time of the algorithm in cni is 0{mfn\ogm). Furthermore, the 
minimum cost matching problem can be solved in 0{n^m + log n) time j2]. 

For non-negative costs and any fixed e > 0, Shmoys and Tardos have 
given also a randomized algorithm that generates a schedule of cost at most 
(1 -I- e)C and makespan at most (2 + e)T and that runs in expected 0(n^ logn) 
time. Recently, for constant number of machines m, an approximation algorithm 
that generates a schedule with cost at most (1 J- e)C and makespan at most 
(1 -I- e)T that runs in 0{n) time has been found |I4^ . 



3 Semidefinite Programming 

In this section we describe an interesting method based on semidefinite program- 
ming to get good approximation algorithms. As an example we demonstrate a 
randomized approximation algorithm for the maximum cut (MAX CUT) pro- 
blem that generates solutions of expected value of at least .87856 times the op- 
timal value. This algorithm uses an elegant technique that randomly rounds the 
solution of a nonlinear programming relaxation. This algorithm given by Goe- 
mans and Williamson Pj represents the first use of semidefinite programming for 
approximation algorithms. By using semidefinite programming Karger, Motwani 
and Sudan na have shown how to color a fc-colorable graph with 0{n^ '=+ 1 ) 
colors in polynomial time. Further approximation results based on semidefinite 
programming have been obtained in the last years for different combinatorial 
optimization problems (see e.g. mm). 

A matrix A G is positive semidefinite if Ax > 0 for any vector 

X G IR". For a symmetric matrix A the following statements are equivalent: 

(1) A is positive semidefinite. 

(2) A = B^B for some matrix B G IR™^". 
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A semidefinite program has the following form: 

J2^,j = bk k 

X = (xij) symmetric, positive semidefinite 

Semidefinite programming is similar to linear programming; e.g. the simplex 
method can be generalized to semidefinite programs m- For any e > 0, a semi- 
definite program can be solved within an additive error of e in time polynomial 
in the size of the input and log ^ (see e.g. jSC^SEH). 

In the following we describe the MAX CUT problem. Let G = (V, E) be an 
undirected graph with non-negative weights w{i,j) on the edges {i,j) G E. For 
simplicity, we use w{i,j) = 0 for (f, j) ^ E. The maximum cut problem is the 
problem of finding a partition of V into two sets Ui and V 2 such that the weight 
of the edges with one endpoint in Vi and one endpoint in V 2 

w{Vi,V2) = ^ w{vi,V2) 

ViGVi,V2GV2 

is maximized. We call such a partition a cut. The MAX CUT problem is NP- 
complete even for unit weights = 1 for (*,j) G E) and simple graph 

classes like chordal graphs or complement of bipartite graphs 0 . Furthermore, 
Hastad m has proved that the MAX CUT problem is not approximable with 
a factor of 0.94127. The best previous algorithm with approximation bound 0.5 
was given by Sahni and Gonzales |24[ in 1976. We now present a semidefinite 
program for MAX CUT and, later, the randomized approximation algorithm by 
Goemans and Williamson. 

First, we give a quadratic integer program for MAX CUT. For each vertex 
i G V, we use a variable Xi to indicate whether i G Vi or i G ¥ 2 '. 

_ / -1 iGV, 

“ I 1 iGV2 

Then, MAX CUT is equivalent to the problem 

E ( • •\ 1 — XjXi 

i<jW[l,j) — ^ 

Xi e { — 1, -1-1} 1 = 1,. ..,n. 

We note that the partition Vi = {i\xi = —1}, U 2 = {i\xi = 1} corresponds to a 
cut with weight ■u;(Ui, U 2 ) = J2i<j 

Again, solving an integer program is NP-complete. For the MAX CUT pro- 
blem we use a nonlinear program instead of a linear program relaxation. The 
relaxation consists of allowing Xi to be a n - dimensional unit length vector 
Vi G Sn- The product XiXj in the objective function is replaced by the dot pro- 
duct Vi ■ Vj of the corresponding vectors. The set iFn = {w £ IR"|x • w = 1} is 
called the n - dimensional unit sphere. We get the following relaxation (A): 

max J2i<jW{hj) 

Xj G -Vi = l 



i = 1, . . . , n. 
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Using yij = Vi ■ Vj we can rewrite (A) as the following program (B): 

Y = (yij) symmetric, positiv semidefinite 

yu = l 

We note that {B) is a semidefinite program and that (A) and {B) are equivalent. 
To see the second statement: A solution of {A) can be directly transformed into 
a solution of {B). Furthermore, given a solution of {B) a Cholesky decomposition 
can be used to reconstruct the vectors Vi without changing the objective function. 
We present now the randomized algorithm for the MAX CUT problem: 

Algorithm: 

Step 1: Solve the semidefinite program (B). 

Step 2: Using a Cholesky decomposition, obtain an optimal set of vectors Vi. 

Step 3: Pick a random unit length vector r G 

Step 4: Set Vi = {i\vi • r > 0} and V 2 = {i\vi • r < 0}. 

In other words, we choose a random hyperplane (with r as its normal) and 
partition the vertices V into 

— vectors in V\ that lie above the plane and 

— vectors in V 2 that lie below the plane. 

Next, we analyse the quality of this solution. Let W denote the value of 
the cut produced by the algorithm, and let E\W] be the expected value of the 
partition. We denote with sgn{z) the sign of a real z; it is +1 if z is non-negative 
and —1 otherwise. 

Theorem 2. 

E[W] > 0.878 

i<j 

Let W{OPT) be the value of a optimum cut, and let Wsdp(OPT) be the 
optimum value of the semidefinite relaxation. Since the right hand side of the 
inequality above is the optimum value Wsdp{OPT) of the relaxation (A) mul- 
tiplied with 0.878 and since Wsdp(OPT) is an upper bound of W{OPT), we 
get a cut whose expected weight is at least 0.878 times W{OPT). To show the 
Theorem above, we use first the linearity of the expectation and get 

E[W] = '^w{i,j)Pr[sgn{vi ■ r) yf sgn{vj ■ r)]. 

i<j 

Then, the Theorem is implied by the following Lemmas: 



Pr[sgn{vi ■ r) yf sgn{vj ' ~ 



Lemma 3. 
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Proof. The probability that a random hyperplane separates the vectors Vi and 
Vj is directly proportional to the angle between the vectors, and the angle 0 = 
arccos(t)i • Vj). Furthermore, 

Pr[sgn(vi ■ r) sgn{vj • r)] = 2Pr[vi • r > 0, • r < 0]. 

The set {r\vi ■ r > 0,Vj ■ r < 0} intersect with the sphere is a spherical digon 
of angle 9 and has a measure equal to ^ times the measure of the full sphere. 
This implies that Pr[vi ■ r > 0,Vj ■ r < 0] = ^ , and shows us the Lemma. □ 



Lemma 4. Let a = mino^g<n g i-cos6> • Then, 

E[W] > '^^w(i,j)(\-Vi-Vj). 

i<j 

Proof. Using the Lemma above, we get 

E\W] = ^ arccos(vi • vj). 

i<j 

Then, using the non-negativity of the weights w(i,j) and the inequality 

^arccos(y) > a^(l-y) 

for — 1 < y < 1 applied to y = Vi ■ Vj we get the statement of the Lemma. The 
inequality follows directly by changing the variables y = cos 9. □ 

Finally, the quality of the approximation algorithm is given by the estimation 
of a: 



Lemma 5. 



a > 0.87856. 



Proof. Consider the function ^ i_cose - First, we observe that cos 9 >1—^9 
for 0 < 9 < Y and, equivalent, ^ i_cose > 1 for 0 < 0 < E- Furthermore, we 
observe that the function f{9) = 1 — cos 9 is concave in the interval ["^,77]. This 
implies 

f{0)<fm + {0-9o)f'{9) 

for any 9q G 7T]. This can be rewritten to 



1 — cos 9 < 1 — cos 9q + {9 — 9o) sin 9q 

= 0sin0o + (1 ~ cos do — 00 sindg)- 



For do = 2.331122 we have 1 — cos0q — 0osin9o < 0, and obtain as a consequence 



1 — cosd < d sin do. 



This implies that 



a = mino<:g<n 



2d 2 1 

> 

77 1 — cos d 77 sin do 



> 0.87856. 



□ 




130 



K. Jansen and J. Rolim 



Finally, the algorithm can be implemented in polynomial time. We suppose 
that the weights are integral. Using an approximate algorithm for the semide- 
finite program, for any e > 0, we get a set of vectors Vi in polynomial time 
with objective value > Wsdp{OPT) — e. Using these vectors, the randomized 
algorithm produces a cut with weight > a{WsDp{OPT) — e) > {a — e)W{OPT). 



4 Random Rounding and Conditional Expectation 



Random rounding has been introduced by Raghavan and Thompson |2,3j . The 
de-randomized rounding using a pessimistic estimator is due to Raghavan 1221 . 
Both results are also presented in Raghavan’s PhD Thesis m- 

Random rounding has been used to develop approximation algorithms for the 
Maximum Satisfiability problem jSj and the Constraint Satisfaction problem pm- 
In these algorithms, the probability distribution used to round the variables is 
not the solution of the relaxation, but rather a convex combination of the solution 
of the relaxation and of the uniform distribution. The de-randomization of these 
algorithms is easier since, basically, any 0/1 solution is feasible. 

The performance of rounding can be improved in special cases, for example 
for resource-constrained scheduling problem m and for packing and covering 
integer linear programs UH- In both cases, the authors de-randomize their ro- 
unding schema using new pessimistic estimators. The framework is as follows. 
Say that we have a probability space (Pr, {0, 1}") and a set of “good strings” 
4 C {0, 1}" such that Pr(4) > e for some e > 0; assume also that the algo- 
rithm that we want to to de-randomize uses randomness only to find a string 
X e {0,1}”. 

A deterministic algorithm can construct such a string in the following way. 
For j = 0, 1, . . . , n and (6i, . . . , 6^) G {0, 1}*, let us call = Pr[x ^ A\xi = 

bi, . . . ,Xi = bi], Oi course we have = 1 — Pr[A] < 1, moreover we have that, 
for any i and any (6i, . . .,h), either ^ O'" ^bit^.b..i - ^bi.....b.- 

Observe that P/* ^ is either equal to 0 or 1, and is equal to 1 if and only if 

(&!,..., bn) ^ A. 



Algorithm cond-prob for i = 1 to n do 



if PI 



<H 



h :=0 
else 
bi ■■= 1 ; 

return (6i, . . . ,5„); 



d — 1 

bi,....bi_i 



then 



The above algorithm maintains the invariant that, at any iteration of the for 
loop, bi receives a value such that P/^ < P/~^ By induction, this implies 

that P/{ < P^ < 1, and so P/{ b„ = ^ (^i> ■ ■ ■ ,bn) & A. 

This method could, in principle, be applied to almost any randomized algo- 
rithm. In practice, however, the computation of the conditional probabilities is 
an utterly complicated task. 
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A careful examination of the proof of correctness of the method reveals that 
it is not really necessary to exactly compute all the conditional probabilities. 
This is formalized below 



Definition 6. A pessimistic estimator for set A and probability distribution Pr 
is a set of values {Ul^ h”)G{o i}' that the following properties hold. 

1. U° < 1; 

2. Vi = 0, . . . , n — 1, for all {bi, . . . ,bi) G {0, 1}*, 

3. Vi = 0, . . . , n, for all {bi, . . . ,bi) G {0, 1}*, 

pi > Tji 



If we have an algorithm that computes in poly(n) time, then we are 

done: we can run algorithm cond-prob using in place of The 

proof of correctness is the same. 

The method of conditional probabilities is a standard way to obtain deter- 
ministic constructions out of an existence proof that involves the probabilistic 
method. A clear exposition is in P^. Pessimistic estimators are defined in m 
Randomized rounding is an algorithmic technique that is suitable for de- 
randomization using conditional probabilities. The general framework is as fol- 
lows: we have a problem that can be formulated as an integer linear program 
(ILP) with 0/1 variables. We relax the ILP to a linear program (LP) and we 
solve it to optimality. Then, we interpret the fractional solution obtained on 
this way as a probability distribution over the variables. The constraints of the 
LP are satisfied with high probability and the expected value of the objective 
function is close to the value of the relaxation. 



Theorem 7. Let x = (a;i, . . . , Xn) be a vector satisfying ■ x = b, 0 < Xi < 1. 
Define the random variables yi, . . . ,yn such that yi is equal to 1 with probability 
Xi and equal to 0 with probability (1 — Xi) then, for any / > 0 , the following 
holds with probability at least (1 — n~^): 



^ ^max V/nlogn <a^-y<b + 

^^max yj fnlogn (1) 

where Umax = rnaxi |ai|. 

The Theorem also holds for rounding inequalities. It is also clear that the Theo- 
rem can be extended to systems of m equations; in this case the error will be 
Omax\//^log "nin. Let x' be a fractional solution to the following linear program. 

max • X 

Subject to 

Ax < b 
0 < X < 1 



Construct an integer solution y probabilistically by setting, independently 
for any j, yj = 1 with probability x'a and yj = 0 with probability 1 — cc' . With 
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high probability, the resulting solution has cost at least (1 — o(l))c^ • x' and 
satisfies 

Ay <b+ 0(ama£c-\/«log mn) (2) 

where amax is the largest entry of A. 

The above fundamental results have direct application to randomized appro- 
ximation algorithms for routing problems and for problems expressible as integer 
linear programs in packing or covering forms (that generalize, respectively, hy- 
pergraph matching and set cover). 

De-randomization uses a pessimistic estimator and yield the following result. 

Theorem 8. There exists a polynomial time algorithm that given a vector x = 
(xi, . . . , Xn), 0 < Xi < 1, and a mxn, matrix A, finds a vector y = j/i, • . ■ , ?/n G 
{0, 1}" such that 

Ax - 0(amax\/nlog mn) < Ay < Ax + 0(amax\/?T- log mn) (3) 

where Umax = maxij \aij\. In addition, for the rows of A with all non-negative 
entries, the stronger bound 

^ QijXj - log n) < ^ Qi^yj < ^ aijXj -k log n) 

i j i 

where = max^ \aij\. 

The very notion of pessimistic estimator has been introduced in order to 
prove Theorem 0 |22j ■ 
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Abstract. Based on the specific characteristics of electronic commerce (E-Com- 
merce) requirements for an adequate software system support, this contribution 
gives an overview of the respective distributed systems technology which is (or 
will be shortly) available for open and heterogeneous electronic commerce ap- 
plications. Starting from basic communication mechanisms this includes (trans- 
actionally secure) remote procedure call and database access mechanisms, serv- 
ice trading and brokerage functions as well as security aspects including such as 
notary and non-repudiation functions. Further important elements of a system 
infrastructure for E-Commerce applications are: common middleware infra- 
structures, componentware techniques, distributed and mobile agent technolo- 
gies etc. Increasingly new and important topics in this area are currently: 
workflow management support for compound and distributed E-Commerce 
services as well as negotiation protocols to support both the settlement and the 
fulfillment of electronic contracts in E-Commerce applications. In addition to an 
overview of the state of the art of the respective technology, the paper also pres- 
ents briefly some aspects of related projects conducted by the authors jointly 
with international partners (sponsored by EU/ACTS, EU/ESPRIT, DEG) in or- 
der to realize some of the important new functions of a systems infrastructure 
for open distributed E-Commerce applications. 



Keywords: Distributed Systems, Electronic Commerce, Electronic Contracting, 
Middleware, Workflow Management, Service Trading/Brokerage 



1 Introduction 

Electronic Commerce (E-Commerce) is frequently considered as the most important 
application area of open worldwide computer network infrastructures such as the 
Internet. Already existing global computer networks provide access to a nearly un- 
bounded number of different functions and services. System support for such a com- 
plex distributed application area like E-Commerce comprises both an increasingly 
wide variety of functions and techniques already known from traditional communica- 
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lion, information, and cooperation support systems - as well as more specific functions 
in order to support, for example, service selection, trading, contract negotiation, secu- 
rity, and payment activities - to name just a few [12, 9]. It has to fulfill strong flexibil- 
ity and interoperability requirements on its respective software components - both at 
the system support as well as at the application level. It also reflects changing re- 
quirements and preferences of globally distributed, heterogeneous, cooperative or- 
ganization structures. 



1.1 Centralized vs. Decentralized Architectures 

For example, many computer system customers demand today a re-centralization of 
enterprise computing systems in order to reduce roll-out efforts and maintenance 
costs. This development can be considered as a sober response to the idea of distrib- 
uting any service at any time on any kind of heterogeneous hardware and operating 
system environment. Today, certain “low-level“ services are accepted as inherently 
distributed such as DNS, NFS, HTTP, SMTP, etc. On the other hand, there are many 
others that were expected to be distributed applications in the early 90ies and before: 
distributed databases, shared editing, application-level extensions to the telecommuni- 
cation infrastructure. However, they did not succeed in day-to-day practice by now. 
Why that? 

One may argue that the first services are historically better understood since they 
have been developed for over 15 years, but there is yet another reason for the lacking 
success of the latter: the complexity of their respective specifications. In most cases in 
which a centralized alternative existed without prohibitive costs this option would usu- 
ally be chosen. 

Despite that, however, the Internet has in the meantime established a drastically de- 
centralized communication platform that flattens hierarchies, overcomes organiza- 
tional borders and liberates small companies and individuals from high investment 
costs for communication services. This development also stimulates the cooperation 
between distributed business partners in many ways. 

Finally, it is trivial to state that major organizations today are usually distributed 
across cities, regions, and countries. Since they have to coordinate the exchange of 
goods, services, and payments across their organizational boundaries and therefore 
usually across long distances as well, there is - for such electronic commerce applica- 
tions - a quite natural need for a corresponding system software support by appropriate 
distributed systems. In contrast to the above mentioned areas, here the distribution of 
separate applications and respective software components is not only an option but 
rather a necessary precondition for realizing adequate open system support for such 
electronic commercial market places. 

In the following, we first analyze the application fields for electronic commerce 
systems a bit more systematically and then provide some examples for components of 
an underlying system support technology. 
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1.2 Modeling Commercial Transactions 

In order to provide a systematic classification of electronic commerce technology, two 
viewpoints could principally be taken: either a distinction can be made between busi- 
nesseses, consumers, and public authorities, or between respective roles of, e.g., buy- 
ers and sellers of goods and services [6]. However, if we consider single persons as 
legal entities thus representing an equal market participant like a company, the borders 
between these categories will blur. For this reason we will follow the so called „phase 
model“ for commercial transactions [19]: 

• In the first information phase, market participants offer product specifications, 
look for possible transaction partners, compare product specifications and prices, 
and evaluate offers. 

• Then, after an initial contact has been established between some market partici- 
pants, respective (service) offers and counter-offers are exchanged during the so 
called contract negotiation phase. This negotiation process may either lead to a 
situation where agreed terms and conditions have been reached or the negotiation is 
abandoned. 

• In case of a contract establishment, first all participants commit their participation 
in the contract with their respective signatures, then the agreed assets are exchanged 
during the contract performance (or: execution) phase. The time-span of this phase 
may reach from a few seconds up to several years. 




Fig. 1. Phases of Business Transactions 



Following these phases, a clear separation of services can be given that are required 

for an electronic marketplace: 

1. Information phase: This phase may be supported by (computer) functions like on- 
line catalogues, search engines, banner advertising 

2. Negotiation phase: Here support for telecollaboration, negotiation protocols and 
strategies may be required. 

3. Execution phase: During this phase, workflow management, business process inte- 
gration among market participants, electronic payment systems, EDI-based mes- 
sage exchange functions, etc. may be provided in order to support the automatic 
execution of E-Commerce applications. 
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In between these phases, the following additional services may be required: 

1. Brokerage support in order to select and match respective offers and inquiries, to 
form a (service providing) consortium or to set-up the negotiation session for all 
parties of the commercial transaction 

2. Signing support to enter the execution phase by establishing a contract and ensuring 
for all parties to sign it. This process may be supported by ‘trusted third parties’ 
such as ‘certification authorities’ or ‘electronic notaries’. 

In order to keep the model simple, yet without unrealistic abstraction, we consider any 

possible good resp. service as a service; specifically 

• also a payment is a „service“ provided by the customer. The result of the service is 
the transfer of data which is interpretable as a transfer of a value. This might either 
be an electronic coin or the settlement authorization between two bank accounts. 

• An addition, also a tangible good can be represented in the system as a „service“: It 
is selected, ordered and paid electronically and even the physical delivery is ac- 
companied by a range of services and data communications that may be used by the 
commercial parties (for example: transfers of electronic EDI documents, access to 
information on the delivery state, etc.). 



1.3 Organization of the Rest of the Paper 

The rest of this paper is organized as follows: In Section 2, we provide an overview of 
current system technology components for the respective phases as defined above. 
Afterwards, a system support reference architecture as developed in the EU/ESPRIT 
project COSMOS is presented as an example of a possible integration of these phases 
and technologies under a unified electronic contracting model. A perspective on future 
trends in system support for E-Commerce applications as well as an outlook of the 
COSMOS project is finally provided in Section 4. 



2 Distributed Systems Support for Commercial Transactions 

By following the steps of the phase model, requirements and solutions for distributed 
systems applications are discussed for the identified electronic commerce areas: 



2.1 Catalogue Services 

To inform customers about the range of product and their specifications, catalogue 
services are used as a shared information system for both vendors and customers. A 
catalogue service also establishes a comfortable front-end for the following transaction 
phases, payment and product delivery. Internally, catalogues are supplied by the ven- 
dor’s stock management system to keep the information displayed in synch with the 
physical warehouse. 
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Today, a catalogue service is deployed by a single vendor who intends to make ac- 
cessible his range of tangible or intangible (‘soft’) goods to the customer. For small 
vendors, a catalogue may be hosted by a third party in the same way as the web server 
is provided by an Internet Service Provider. In this case, an individual vendor remains 
responsible for his own ‘shop’ in terms of presentation and product data. A unified 
settlement of payments and possibly the delivery of soft goods, however, can be cen- 
tralized by the shopping mall provider. 

As a next step, providers of catalogue services tend to further break down the effort 
for offering goods in a mall system by allowing vendors to enter single offers into the 
catalogue: this leads to an offer database, that allows competing market participants to 
register their offers in a suitable category of the offer database. 

This concept has already been addressed several years ago by the ODP trader serv- 
ice [11], that not only suits for storing (exporting in ODP lingo) product specifications 
but also service specifications in a formal sense: services are understood as instances 
of a service type that includes service attributes and the interface type. A trader addi- 
tionally provides exporting but also the matching of offers and inquiries. Later-on, this 
technology has been incorporated into CORBA standardization as the trader Object 
Service [16]. 



2.2 Service Brokerage 

Usually, a catalogue access ends up in a purchase, which doesn’t require any further 
refinements of a contract or other terms. Usually the good is purchased at the offered 
price. However, if not only human users are involved, a formalized matchmaking 
service can be utilized to bring together customers and suppliers or even a group of 
market participants. In this case a brokerage service may be used. To accomplish its 
task, such a broker requires formal specifications of both the services offered as well 
as those demanded. Again, the ODP trader suits well for this activity: having already 
service specifications for different offer categories at hand, an importing client then 
only needs to specify the required service by using an OQL (Object Query Language) 
statement. 

The trader may also be applied to more than two participants in a commercial 
transaction by specifying a set of required services which are obtained step-by-step 
from the offer database. 

Brokerage services are incorporated into different electronic commerce applications 
today. To name just a few examples: 

• First of all, globally distributed trading/brokerage functions can generally be used 
for selecting the (according to pre-defined criteria) “best possible’’ services in an 
open distributed service environment (such as, e.g., the Internet) [14]. 

• In the specific context of an open digital library (as an important, dedicated exam- 
ple of an open service market), e.g., the MEDOC project prototype uses a broker- 
age function for the matching of literature offered and demanded by users of elec- 
tronic library systems [1]. 




140 



W. Lamersdorf, M. Merz, and T. Tu 



• Another ‘Service Broker’ has been developed as a part of the EU/ACTS research 
project OSM (Open Service Model) [15] in the context of service selection for E- 
Commerce applications. The respective broker architecture is here based on the 
OMG CORBA standardization of a Business Object Component Architecture 
(BOCA) which aims at tying together service offers that have been entered into a 
common catalogue [17]. 



2.3 Service Negotiation 

Negotiation is the process of reaching an agreement for a service specification. This 
may take place either out-of-band, by letting market participants negotiate without 
electronic means, or it may be done on-line. In this case there are several stages of 
automation possible: 

• Use of collaboration tools. In this case, human users are involved in the negotiation 
process. They use, e.g., a shared-editing tool that allows them to concurrently edit a 
document in a consistent way. The negotiation is free-form, i.e., there are no re- 
strictions for the order of document accesses or the structure of the document. 

• Use of negotiation protocols. In this case, either human users or software compo- 
nents participate in the negotiation. The negotiation subject is still unstructured, i.e., 
the participants ‘know’ how to deal with it. The ordering of document accesses 
however is formalized and parameterized, .i.e., a negotiation protocol is applied to 
specify which party delivers which information at which stage to whom. The nego- 
tiation can be understood as a workflow process that is driven by a predefined proc- 
ess description. 

• Use of formalized conversations to further structure the negotiation protocol. 
Speech act theory (Specifically, the Knowledge Query and Manipulation Language, 
KQML [2]) provides a linguistic means to define formalized messages that relate - 
in the case of negotiations - to concepts such as ‘offer’, ‘reject’, ‘propose’, ‘accept’, 
etc. This further helps to tailor the involved software systems for the specific appli- 
cation of negotiation support: it may, e.g., react in a different way when it receives 
an offer instead of a proposal. 

• Einally, the complete negotiation process may be automated (and therefore dele- 
gated to ‘autonomous’ software components) if the ontology for the negotiation 
subject has been standardized as well. In this case ‘speed’ and ‘price’ are features 
that a software component is able to reason about. Therefore, AI technologies are 
applied in this area for knowledge representation and for applying policies that have 
been defined to control negotiation strategies. Such an intelligent software agent is 
now capable at least to estimate ‘price’ and ‘speed’ and to trade-off their values in a 
reasonable way [21]. 

In today’s real world, automated negotiation is not used so far for the following rea- 
son: Only if the service specification is kept simple (i.e., has only a few ‘Quality of 
Service’ attributes), a strategy module can be practically used for negotiating them. 
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The more complex the specification becomes, the more effort needs to be spent for 
implementing policies and strategies for the agent. If a simple specification is suffi- 
cient, the service can be considered as a commodity on the other hand, i.e., a good that 
is offered by a large number of vendors on the market and for which an individual 
negotiation would come at prohibitively high costs. 

Therefore, the practical integration of negotiation mechanisms won’t be feasible 
unless negotiation support is designed as an integral part of the overall software sys- 
tem. 



2.4 Service Configuration 

Services offered in an open and heterogeneous environment such as the Internet can 
only be competitive if they are flexible enough to adapt to a wide variety of user as 
well as technical requirements. Therefore, appropriate techniques are needed to (re-) 
configure an offered service dynamically according to the effective requirements in 
each case. For example, regarding the involvement of so-called ‘third party’ services - 
such as a payment or a notary function - during a business-to-business transaction, as 
many options as possible should be supported, and moreover, an option common to all 
transaction parties has to be determined and activated. 

A very generic approach to provide system support for such kind of dynamic serv- 
ice configuration consists in using policy management mechanisms, in whichso called 
policies provide a formalization of arbitrary requirements which can be evaluated, 
compared, matched (or unified) and activated in an application independent way [20] . 
Policies can be added and activated fully automatically at run-time without changing 
the application code. The configuration effect is achieved by modifying externally ac- 
cessible properties which are used as system parameters by the applications. 

In a broader sense, flexible service configuration also means that arbitrary services 
should be so configurable that they can be easily plugged together to yield new func- 
tionality, i.e. that they can be used as building blocks to assemble new services “on the 
fly“. Providing technical support to fulfill this kind of requirements is precisely the 
objective of componentware techniques [8] which seem to be the right mean to face 
the growing challenges in the filed of E-Commerce, especially concerning the re- 
quirement of dynamic adaptability. However, there are still a lot of open questions and 
unsolved problems - for example, the composition of an application system out of 
components has to be distinguished from the generation of a new component out of 
existing ones since they have to fulfill, among other things, very different performance 
requirements - which are currently being investigated in projects such as DynamiCS 
[5] in order to make componentware technology applicable in practice. 
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2.5 Electronic Contract Signing 

From the legal perspective, in many cases contracts don’t need to be signed explicitly. 
They often become valid even if they are agreed on orally or by a concluding action. 
For E-Commerce application, that may mean, for example: whenever a customer hits 
the ‘Buy’ button of an electronic shop application, it can be assumed that all the re- 
spective consequences are well-known and accepted. 

On the other hand, for certain applications there are also several and good reasons 
that promote the idea of involving an electronic contract into (more secure) explicit 
online transactions: 

• A written contract cannot be repudiated. In the case of an electronic contract, this 
can be signed by the parties as well as by a trusted third party. This states who 
agreed on which terms and at which time. Any arbitration that may be required 
among these parties can be settled better if there is a version of the contract avail- 
able that has been archived by a neutral auditor. 

• The legal framework for online commercial transactions is being established in 
several countries now. Electronic signatures are at least accepted as an authentica- 
tion means for the document signed. However, the management of a contract still 
requires a further harmonization of the national legislation for the participating 
countries. 

Complex legal situations can be better fixed by using a document as the 
common form of agreement. It is best practice today that commercial ven- 
dors display their terms and conditions as a part of their online presentation. 
However, it would clarify the legal situation if these documents are not dis- 
played transitionally on the Internet but if they could be escrowed and ar- 
chived at a third party (e.g. the Chamber of Commerce). This would allow 
the contracting parties to refer to this document even a long time after it has 
been replaced by a new one. 

Furthermore, some contracts may be negotiated and closed that require complex 
specifications such that it is essential to handle them in written form as a shared 
document. This applies to work plans as well as to complex relationships for obli- 
gations and right within consortia. 

Finally, in contrast to paper-based contracts, their electronic counterpart are execu- 
table. Structurally, such contracts incorporate clauses that determine the obligations 
and rights of each party. From the technical point of view, this can be interpreted for 
many contracts as an activity or a service that is to be provided at a certain time (pay- 
ments, delivery of a good or a report, translating a document, or printing, binding and 
delivering books). Therefore, the execution phase of the commercial transaction is not 
only interpreted in the legal sense as the execution of a contract, but specifically in a 
technical sense by invoking the corresponding service through remote method invoca- 
tions. 
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2.6 Electronic Contract Execution 

Considering the final transaction phase, we may view possible levels of electronic 
support again from several different angles. 

• At least, the legal execution can be monitored at the human level - as it is the case 
in ‘classical’ commerce. 

• Since, howvere, deadlines, durations, etc can be represented electronically, this 
information may also be transferred to a workflow system that automatically sends 
notifications to the parties involved. These notifications refer to the actions the par- 
ties agreed upon, e.g., initiating a payment or performing an action. This can then 
be called a ‘supportive’ workflow system. 

• At the most sophisticated level, these actions may even be triggered by a workflow 
engine that performs method invocations at the different information systems which 
the parties made available for the others. In this case, a distributed computing infra- 
structure is assumed that easily enables market participants to be represented not 
only through Web servers but also through distributed object-oriented software 
components [13]. Moreover, these components needs to be configurable at run-time 
in order to integrate them as a part of the commercial transactions 

This final situation is, however, only possible if a global network of related objects 
exists and if these objects can be inspected, refined, combined and integrated as dy- 
namically as a contract is dynamically set-up and executed. A specific requirement is 
for this reason to transfer the workflow specification that all transaction parties agreed 
upon into the process definition that is required for a given workflow engine. 

Therefore, such an approach can not be successful if the workflow mechanism is 
isolated from the previously mentioned mechanisms for negotiation and signing. Ac- 
cordingly, the example E-Commerce infrastructure reference architecture - as de- 
velopped in the EU/ESPRIT project COSMOS and briefly presented below - has been 
designed and is currently implemented in such a way. 



3 Case Study: The COSMOS Project 

As said before, in order to accommodate the different functions mentioned above in a 
common framework, a unified systems architecture is required - at least in terms of an 
integrated object model and a functional specification of basic software components. 
Exactly this is the goal of COSMOS (Common Open Service Market fOr SMEs), a 
European ESPRIT research project that designs and implements important technical 
software system components for carrying out business transaction across the Internet 
[4]. In the following, the respective approach is presented in some detail as a case 
study for demonstrating the use of some typical distributed systems functions and 
technologies as mentioned above. 

Compared with existing electronic commerce architectures such as CommerceNet 
eCo [3], TINA [22] and the OMG Electronic Commerce Reference Architecture [15], 
COSMOS is tailored around the concept of a contract, which is not only used as a 
metaphor but as a tangible part of the complete process of a commercial transaction. 
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The COSMOS architecture mainly focuses on software design aspects and less on 
organizational questions. For the latter we refer to the COSMOS white paper [4]. 



3.1 COSMOS Contract Model 

As the most relevant part of the COSMOS object model, we focus on the modeling of 
contracts since they serve as the common nexus for most of the transaction phases and 
building blocks of the implementation. 

A COSMOS contract could be considered as structured document composed out of 
text blocks. In this case, the editing process would be simplified, however, the auto- 
mated processing of a contract will be very limited. On the other hand, one could 
attempt to cover the full semantics of a contract by building a ‘contracting expert 
system’. We consider this as a dead end since the expert system overhead is expected 
as too high - particularly for a Small and Medium Enterprises (SME)/Internet context, 
characterized by a permanent change of rules, roles, and business subjects. 

As a trade-off, the COSMOS contract model aims to identify only those semanti- 
cally meaningful parts of contract instances which allow for efficient automation and 
therefore highest increase of the added value. The parts of the contract model can be 
distinguished by their subject: 

• The ‘Who’ part: Parties, Persons, and Signatures are related to the participants of 
the contract. Parties act under a certain role defined by the contract template. They 
are instantiated as a legal entity which can be in turn a person or an organization. 
The first may, the latter must be represented by proxies. “Party“ only indicates that 
the legal entity is involved in the contract and abstracts away from the actual tasks 
which are defined for the corresponding role. Einally each legal entity is associated 
with a signature when the contract has been closed. 

• The ‘What’ part is the subject of the contract. It covers all obligation of the in- 
volved parties. Each obligation is considered as a transfer of a right which can be 
either a good, a service, money, or a license. An important feature of the obligation 
is a list of QoS attributes. It is used for contract templates to specify suitable par- 
ties. During contract negotiation, these QoS attributes are subjects of offers and 
counter-offers. Einally, obligations are to be carried out in the basis of these attrib- 
utes during contract execution. 

• The ‘How’ part defines relationships between obligations: when are which services 
to be delivered? What is the deadline? Which clause will apply when a party falls 
behind its obligation? The “How“ part is used to derive a workflow that defines 
causal relationships, data transfers, delays and deadlines, and the final termination 
of the execution phase. 

• Einally, some common clauses form the fourth part of a contract. These clauses 
address general terms and conditions at the level of the contract. Also references to 
applicable external contracts, regulations, and legislation are placed in this part. 
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Apart from the structural perspective, a contract goes through several steps in line with 

the transaction phases: 

• Initially, a contract template is defined, which usually predefines the ‘How’ and the 
‘base clauses’ parts. Additionally, roles are defined and for each obligation a re- 
quested set of conditions. However, the template does not yet identify the contract 
parties nor the exact obligations: instead of attribute/value pairs (such as ‘price per 
acre = $100’ or ‘ground’s humidity = 20%’), constraint expressions are used as 
QoS specifications (such as ‘price < $150’ and ‘humidity < 30%’). 

• Contract proposal. By using the broker, the template will be completed if suitable 
providers can be retrieved from the catalogue. The broker’s task is to replace QoS 
specifications with the corresponding values offered. For each category of obliga- 
tions a corresponding offer category is required for the catalog. Accordingly, the 
party objects of a contract template are replaced by the respective participant de- 
scription taken from the catalog. If the brokerage step leads to a completed contract 
that can be signed in principle, a contract proposal is given. 

• During negotiation, contract proposals are exchanged between the parties. De- 
pending on the semantics of such a contract transfer, it may either be considered as 
a proposal (without legal binding) or as an ojfer (with legal binding if the other 
parties accept). If all parties accept, the contract is in an agreed state and ready for 
signing. 

• After all parties (or their proxies) signed the contract, the electronic contracting 
Service certifies this. Afterwards, the contract is executable, i.e. in technical terms, 
it can be transferred to the workflow system. 



3.2 COSMOS Building Blocks 

The five functions discussed in Section 2 are covered respectively by five corre- 
sponding building blocks in the COSMOS reference architecture underlying the 
COSMOS prototype implementation (see Fig. 2). 

These functions are tightly integrated since they communicate with each other by 
using contracts as the common representation for the data transferred. All parts of the 
respective COSMOS prototype may be used optionally by the market participants; It 
may, for example, happen, that a consortium was already formed before contract ne- 
gotiations start - such that the catalogue and broker functions are not needed in this 
case. In other cases, no negotiation is required since the negotiation process itself 
would be too costly compared with the transaction volume. Finally, one may think of 
scenarios where no workflow execution is necessary or possible. The configuration of 
COSMOS components is thus dynamic and depends on the specific business require- 
ments of different kinds of application (i.e. business) transactions. 

The COSMOS reference architecture abstracts from implementation technology. 
This concerns not only its software components but also the contract model. Several 
current technologies can therefore be used for realizing the respective COSMOS E- 
Commerce negotiation support prototype implementation, e.g.: 
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• Current Web technology provides the highest performance for online access. This is 
also the area where new standards emerge at the highest pace. 

• the OMG CORBA standard promises independence from proprietary hardware and 
operating systems, here BOCA gains increasing visibility [17] 

• Several Frameworks are available for a "plain Java’ approach, e.g.. Voyager [18] 
for the communication platform or IBM’s San Francisco Framework [10] for 
building component-based applications. 

• Finally, also legacy technologies such as EDI has to be supported in the future as an 
(optionally) integrated part of such an E-Commerce systems infrastructure archi- 
tecture. 
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Fig. 2. Building blocks of the COSMOS architecture 



4 Conclusions and Further Developments 

System support for electronic commerce is a most practically relevant topic for dis- 
tributed systems research and technology. In order to meet the important inherent 
openness and flexibility needs of global electronic markets, it requires ad-hoc software 
integration both at the system as well at the application level. At the same time, these 
requirements cause several problems for the introduction of electronic commerce 
applications: On one hand side, they need to be standardized in order to properly co- 
operate with one another, but they also need to be dynamically deployable, extensible, 
and integratable on the other. As a result, we face a ‘balkanized’ separation of elec- 
tronic commerce tools and technologies today that can only be made interoperable if 
they adhere to certain standards. However, standardization is a long-term process 
during which technology development often makes many of the efforts spent there 
obsolete. 
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Under this circumstances, the rationale for a generic system infrastructure like the 
one developed in the COSMOS project is to decompose its functional components into 
consecutive phases, rather than horizontally into abstraction layers. Then the links 
between these components can be standardized at the level of the contract model; and, 
on the other side, any technology decisions for individual component developers are 
deliberately left open. A most important prerequisite for realizing such a system ar- 
chitecture, however, is a (much more general) open and dynamically controllable 
component based approach to (also application level) software development. 

After having implemented the required functions for the support of commercial 
transactions, current research developments address additional technological refine- 
ments. To give an example, generic support for auction systems that is currently being 
developed at Hamburg University [21] will be integrated to support group negotiation 
patterns. Another example is the integration of market participants by following the 
component-based approach to software integration. A possible direction has been 
described in [13]. 

Finally, mobile agent technology is incorporated in the COSMOS project for the 
transfer of contracts between negotiating parties. Additional rationale for the applica- 
bility of this technology is given in [7]. 
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Abstract. Interactive media server systems play an important role in 
the envisioned ‘Information society’. Powerful media server systems are 
one of the cornerstones of the networked society in which media servers 
store news information, product descriptions, customer information, vi- 
deo clips and many other media elements that are used to inform consu- 
mers, run businesses, or entertain people. 

Within this paper we distinguish two types of media objects. Realtime 
media on the one hand and non-realtime media objects on the other 
hand. Whereas realtime media, e.g. audio and video streams, are mainly 
used in information and entertainment applications, non-realtime media 
is used in all general purpose applications, e.g. conventional web services. 
The paper presents the design of two media server systems, handling 
one of the two types of media objects each. The server systems described 
in the paper are both based on a distributed memory parallel computer 
system. For each of the server systems presented here, a single important 
question is studied in detail. This is the data layout question for non- 
realtime media servers and the communication scheduling problem for 
realtime media servers. 



1 Introduction 

Digital libraries as they exist today and will be available in the future contain all 
kind of media objects. Ranging from simple ASCII text documents, to structured 
documents using hyperlinks and integrated animation, sound and video a whole 
range of digital libraries exist today or can be envisoned for the near future. 

The media components stored by a digital library can be distinguished accor- 
ding to their realtime properties. Media objects as audio and video have strict 
realtime requirements for their delivery and presentation from the server to the 
client. If a audio/video stream is delivered from the server to a client, data 
packets must be send right in time, i.e. not to late to avoid interruption of the 
presentation and not to early to avoid buffer overflow on the client. Thus, a ser- 
ver system delivering continuous media information has to take these realtime 
properties into account. As audio and video objects are also usually very large 

* This work was partly supported by the MWF Project “Die Virtuelle Wissensfabrik” , 
the EU project SICMA, and the DFG Sonderforschungsbereich 1511 “Massive Par- 
allelitat: Algorithmen, Entwurfsmethoden, Anwendungen” . 
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in size and in amount of data that has to be delivered from a server to a client, 
powerful server systems using a moderate number of processors (SMP systems) 
or a scaleable number of processors (MPP) are used for the implementation of 
continuous media servers. 

Digital libraries that contain a moderate number of media objects having 
no realtime characteristic are mostly stored on conventional computer systems 
(holding at most a moderate number of processors and storage subsystems) . If a 
digital library of this kind is requested by a larger number of clients in parallel 
(providing short latency times) or the amount of data items stored in the library 
becomes very large, also here the use of moderate or large parallel systems is 
favourable. 

Within this paper the design of parallel interactive media server systems 
for the storage and delivery of both kind of media objects is presented, i.e. 
server systems taking the special characteristics of realtime media into account 
as well as media servers for the delivery of non-realtime media to a large number 
of clients are studied. We study the implementation of Interactive Continuous 
Media Servers (ICMS) for the delivery of encoded audio and video streams as 
well as that of a scaleable web server for the delivery of non-realtime media 
objects. 

The highest level of abstraction considers a parallel interactive media server 
as a set of storage subsystems, processors, and external communication interfaces 
(connecting the ICMS to an access and delivery network) which are connected 
by an internal communication network. Figure Q presents this abstract model 
for a parallel media server. We will always consider disks as the storage device, 
although in real world implementations the storage subsystem itself will consist 
of a hierarchy of fast memory that can be used for caching, disks and magneto- 
optic devices for mass storage. A processor can be connected to storage devices, 
to external communication interfaces, or to both of them. 

The internal network is built as a structured graph like the butterfly, the 
square n x n grid, or the complete bipartite graph |S|. By using such an in- 
ternal communication network, the overall communication bandwidth can be 
considerably increased compared to systems that are based on buses. 

The implementation of scaleable servers for the delivery of realtime and non- 
realtime media objects has gained considerable attention during the last years, 
as a number of important research questions have to be solved for their optimal 
implementation. 

Within this paper we will focus on two important questions: For the parallel 
web server system that is studied in section 2 the data layout question is studied, 
i.e. the question how the media objects should be mapped onto a set of storeage 
devices is discussed there. 

For the parallel ICMS presented in section 3, the question of communica- 
tion scheduling which becomes very important for distributed memory ICMS is 
discussed. 

Both server systems have been implemented and integrated into various ap- 
plications which are described shortly in this paper. 
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Fig. 1. Abstract model of an ICMS 

1.1 Model 

To describe the methods used for the determination of a suitable data layout of 
a parallel web server, the model of the parallel web server used here, is described 
first. Thus, we concentrate in our model on the aspects that are important for 
the data layout and model other aspects on a higher level of abstraction. 

A parallel web server is build up by the following entities: 

— A number of processing modules that are connected by some kind of network 
or bus architecture. 

— A number of communication devices that connect the processing modules to 
the external clients accessing the server and requesting information. 

— A number of storage devices (disks) that are connected to the processing 
modules. 

The parallel web server stores data items (files) on the disks and works as 
follows: 

— The server is able to accept one or more requests arriving via the communi- 
cation devices from the external clients per time step. 
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— The server forwards a request to the disk holding the requested item. We 
assume here, that each item is only stored once on the overall disk pool. 

— Each disk can accept only one request per time step. If more than one request 
is sent to a disk per time step, these requests queue up (conflict resultion 
via random selection). 

— The processing time for each request on the disk is constant and takes one 
time unit. 

— In one time unit, a disk can process only one request. 

— All disks are independent, so that the server can process a maximum of n 
requests per time step if n is the number of disks. 

In case that two (or more) request are accessing the same storage device 
(disk) within a time interval of time S we say, that these requests are colliding, 
i.e. these requests are in collision. 

The aim of our work is now to develop a data layout strategy to map the 
data items in a way onto the disks, that the requests that arrive at the server 
lead to a minimal number of collisions and therefore to a minimal latency in 
answering the data requests issued by the external clients. 

1.2 Monitoring the Access to Web Servers 

To increase the overall performance of a parallel web server it is not important to 
balance the overall number of requests issued to the disks as evenly as possible, 
but to avoid that a larger number of requests is submitted to a single disk only 
while other disks are idle. This means that in each small time interval the load 
has to be distributed as evenly as possible to all disks, minimizing the latency 
time for a request this way. Thus, the aim is to minimise the number of collisions 
on the disks. 

To get an impression of the collisions that occur we have taken the log flies 
from the web server of the University of Paderborn (www.uni-paderborn.de) for 
November and December 1997. 

If we study the requests that are issued to all flies stored on the web server it 
is typical for web servers, that some flies are accessed very often, while others are 
requested only seldom. Our observations show, that these frequently requested 
data items (flies) are always the same flies. Figure0shows the number of requests 
issued to all flies stored on the server on two consecutive days. The x-axis lists 
all flies according to their hit-rate on the first day. The y-axis lists the number 
of hits for the flies. Both axis are drawn with logarithmic scale. 

As mentioned above, the overall number of requests that are issued to a single 
data item on a web server is not as important as the number of collisions of pairs 
of flies. Therefore we build all tuples (i,j) of flies i and j stored on the web server 
in Paderborn and measured the number of collisions, i.e. the number of hits that 
were made on flies i and j within the time interval 6 which is choosen to be one 
second for our experiments. 

Figure0shows the number of collisions for all pairs of flies for two consecutive 
days sorted by the collisions of the first day. The x-axis lists all tuples (i,j) of 
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files according to the collisions on the first day, and the y-axis lists the number of 
collisions for each pair of files. Now it is interesting to observe that the collisions 
are very similar on the two consecutive days, i.e. accesses that collide very often 
on the first day do this also on the second day. 

So if we develop an algorithm, that minimizes the collisions for a typical 
access pattern of the web server monitored at one day, these collisions will also 
be reduced on the next day. Thus, we can take the similarity of access patterns 
into account for the construction of our data layout strategy. This is the basic 
observation and the foundation of our data layout principle. 




1 5 10 50 100 500 1000 500010000 50000 

collision no. 

Fig. 3. Distribution of collisions 
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1.3 Data Layout Strategies 

In the following we will first explain the algorithm used to compute the data 
layout and then evaluate the performance of this algorithm in detail. 



Algorithm. Throughout the rest of the paper we define F to be the set of data 
items (files) and {(ti, /i), (^2, /2), ■ • ■ , (^m, /m), • • ■} be an access pattern for a 
web server where ti is the time when the request for data item (file) fi G F 
arrives on the server. 

Then c{i,j) =\ {{{t,i), {t' , j)} || t — t' \< i 5 } | is defined as the number of 
collsions of files i and j for the given access pattern. 

Our strategy is to distribute the objects stored on the web server onto the 
given disks in a way that collisions are minimised. For a given access pattern 
this leads to the following algorithmic problem: 

given: A set of data items F and a given number of storage devices n. 
question: Determine mapping tt, tt = min 

It is easy to map this problem to the MAX-CUT-problem. The MAX-CUT- 
problem is defined as follows: 

given: A graph G = (V, E), weights w{e) G IN for all e G E and a given number 
of partitions n. 

question: Determine a partition of V into disjoint sets Vi, . . . , Vn, such that the 
sum of the weights for the edges from E having the endpoints in different Vi 
is maximal. 

The mapping is done in a way that the data items (files) are the nodes of the 
graph that has to be partitioned and the number of collisions c{i,j) determines 
the weight of edge {i,j}- The MAX-CUT problem is known to be NP-hard. Ho- 
wever, there are good polynomial approximation algorithms which deliver good 
solutions. In our experiments we used the PARTY-Library 0 containing an effi- 
cient implementation of an extension from the partitioning algorithm described 

in 0 - 

In the following the performance of this algorithm to determine the data 
layout is investigated. At first we will look at the decrease in collisions that can 
be achived if the exact access pattern is known in advance. As in reality the 
access pattern is not known in advance, we will show how the performance of 
a web server in terms of reduced collisions can be improved if a data layout is 
computed on the basis of the access pattern for one day. Using this data layout 
the collisions that occur for the access pattern of the next day are measured. 



Collision Resolving on One Day. In a first step we examine the gain of our 
algorithm if the access pattern and therefore the collisions are known in advance. 

To do this we took the access statistics of one day and determined the number 
of collisions of each pair of data items. On the base of these statistics, we built 
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the graph, partitioned the graph, and looked how many collisions had remained 
and how many had been resolved. 

Figure 0 shows in percentage the remaining collisions of a random partition 
and our partition respectively, based on the access pattern of the server on 
November 26, 1997. The values are presented in dependence of the number n 
of storage devices. For a random distribution of the files onto a set of n disks, 
an edge describing a collision will be cut with probability So the expected 
percentage of remaining collisions is 




Fig. 4. Percentage of collisions not resolved by random placement and algorithm 



The following results can be obtained from Figure 0] 

— If there are few discs, only few collisions can be resolved. This can be explai- 
ned with the existence of larger cliques that cannot be resolved completely 
when only a few disks are available. 

— Our partition resolves clearly more collisions than the random partition. 

— The more discs are available the greater is the advantage of our partitioning 
method in relation to a random mapping. 

In the following we compare the results of our algorithm with the random 
mapping strategy for a number of access patterns. Each access patterns repre- 
sents exactly all requests that were issued to the server during one day. We 
compare the number of collisions induced by the access pattern (ka) with the 
number of remaining collisions that occur when applying the mapping algorithm 
described above (fc^). The factor / describes the relation between the number 
of remaining collisions for the random mapping (which is and the mapping 
that is determined by the algorithm (fc^). 

In Tabled the statistics and results of the partition of one week are shown. 
The table shows that all collisions up to 4 - 5 percent can be resolved and that 
the algorithm has about 2 to 3 times the performance of the random mapping. 
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Table 1. Results for a number of access patterns, each representing one day 











results, n = 


= 8 


day 


nodes 


edges 


ka 


kr 




/ 


sun 11/23/97 


5533 


15148 


75334 


3152 


4.18 


2.99 


mon 11/24/97 


8877 


48136 


239365 


12100 


5.06 


2.50 


tue 11/25/97 


7932 


43720 


228870 


11367 


4.97 


2.52 


wed 11/26/97 


8206 


41172 


215825 


10800 


5.00 


2.50 


thu 11/27/97 


7464 


44656 


231364 


12021 


5.20 


2.40 


fri 11/28/97 


6976 


30919 


174120 


8671 


4.98 


2.51 


sat 11/29/97 


5065 


9059 


37702 


1305 


3.46 


3.61 


sun 11/30/97 


4798 


8657 


42544 


1579 


3.71 


3.37 



Table 2. Comparison of random placement and algorithm 



disks (n) 


avg(/) 


max(/) 


min(/) 


avg(fe^)[%] 


2 


1.16 


1.22 


1.14 


43.1 


3 


1.34 


1.47 


1.29 


24.9 


4 


1.54 


1.77 


1.47 


16.3 


5 


1.78 


2.12 


1.66 


11.3 


6 


2.05 


2.54 


1.88 


8.2 


7 


2.41 


3.11 


2.15 


6.0 


8 


2.73 


3.64 


2.42 


4.7 



Table 0 shows the influence of the number of disks (n) on the performance 
of the algorithm. The table presents the results for a number of days where 
the mapping was determined by the access pattern of day i and this mapping 
was used to determine the remaining collisions if used for the access pattern of 
day i. It can be regarded, that the number of resolved solutions is increasing 
largely when n increases, so that up to 95 percent of all collisions are resolved. 
It also shows that the performance of the algorithm increases if compared to the 
random mapping algorithm for larger n. 



Optimize the Next Day. In the following we examine how much collisions 
can be resolved if we use the access-statistics of a single day i to determine the 
data layout and apply this data layout to the access pattern of day * -I- 1 This 
approach can only be successful if the collisions of successive days have some 
similarity. We have examined this similarity already above. 

Table 01 presents the results for a number of days where the mapping was 
determined by the access pattern of day i and this mapping was used to for 
the access pattern of day i + 1. The table shows the factor / comparing the 
performance of the random mapping method with the algorithm presented above 
in respect to the number of disks n. It also shows the precentage of remaining 
collisions of day i -I- 1 that could not be resolved. Compared to the number of 
collisions that can be resolved if the access pattern is known, there is only a very 
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small loss in performance. Also compared to the random mapping method the 
algorithm still behaves much better. 



Table 3. Comparison of random placement and algorithm using access pattern of 
previous day 



disks (n) 


avg(/) 


max(/) 


min(/) 


avg(t)[%] 


2 


1.11 


1.14 


1.07 


45.2 


3 


1.22 


1.30 


1.16 


27.4 


4 


1.34 


1.45 


1.25 


18.7 


5 


1.47 


1.63 


1.34 


13.7 


6 


1.61 


1.85 


1.44 


10.5 


7 


1.76 


2.06 


1.56 


8.2 


8 


1.87 


2.16 


1.70 


6.7 



TableEl compares the influence on the performance of the data layout method 
if the exact access pattern is known in advance, or if only the access pattern of the 
day before is known. Values with the index sd are results from the experiments 
basing on the statistics of the same day, i.e. the access pattern is known, values 
with the index db are results that are found if the access pattern of day f + 1 
is applied to the data layout that is determined using the access pattern of the 
previous day i. 



Table 4. Impact of number of disks on algorithm performance 



disks (n) 


fsd 


fdb 


k^,sd r(V] 




^ — fcr.dfe 


fca ^ 


ka u 
-^-kr,sd 


2 


1.16 


1.11 


43.1 


45.2 


0.70 


3 


1.34 


1.22 


24.9 


27.4 


0.70 


4 


1.54 


1.34 


16.3 


18.7 


0.72 


5 


1.78 


1.47 


11.3 


13.7 


0.72 


6 


2.05 


1.61 


8.2 


10.5 


0.73 


7 


2.41 


1.76 


6.0 


8.2 


0.73 


8 


2.73 


1.87 


4.7 


6.7 


0.74 



The value of the term 






shows the relative performance difference of 

the random mapping and the data layout determined by the algorithm presented 
above, for the case that the algorithm knows the exact access pattern or knows 
only the access pattern of the day before. The values of this term are shown 
in respect to the parameter n. The results show that the performance of our 
method decreases by about 30 percent if the data layout is computed on the 
basis of the access pattern from day i — 1 instead of day i if the access pattern 
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of day i is applied. This loss seems to be nearly independent from the number 
of disks but becomes smaller for larger number of disks. 

In general the results show, that the data layout that is based on the access 
pattern of a previous day leads to a large reduction of the collisions on later days. 
Thus, if the data layout is determined on the access pattern that is accumulated 
for one day, it can be expected that this data layout is also very well usable on 
the next days. 

1.4 A High Performance Digital Library System 

Based on the data layout strategy described above a parallel distributed me- 
mory WWW server has been developed that is the basic element of the ‘High 
Performance Query Server (HPQS)’. This system is developed by our group in 
co-operation with other groups from the Universities of Bielefeld, Aachen, Dort- 
mund and Hagen. For a full description of the HPQS see 0. 

The approach followed by the HPQS system extends and integrates various 
technologies into one system. The basic features of the HPQS system are: 

— Questions can be submitted using a natural language interface making it 
very convinient to use the system 

— The questions is interpreted using a number of problem independent data- 
bases and some databases that depend on the application that is targetted, 
i.e. domain specific knowledge 

— A mediator stores metadata information that is extracted from the mass 
data and uses it for further requests 

— A parallel server stores and delivers mass data information and performs 
search operations on these mass data items. 

— High performance specialized processors perform selected search operations 
on mass data items that demand large computational power. 

Figure 0 presents the structure of the HPQS. The system is build up by the 
following five modules that also represent the operational structure of the HPQS: 

NLI: The natural language interface is the user interface of the system. Questi- 
ons are asked in natural formed sentences. The SZS (semantic inter language) 
representation is constructed from the questions. 

Retrieval-Module: From the SZS representations a FRR (Formal Retrieval 
Representation) is constructed. It makes use of transformation- and interpre- 
tation-knowledge. This FRR is transformed to OQL (Object query language) 
requests. 

Multimedia Mediator: The Multimedia Mediator structures and mediates 
the available mass data by managing additional meta data. OQL queries 
are processed by the help of the parallel server. 

Parallel Server: The parallel server manages and delivers all available mass 
data. Additionally, time-consuming methods that work on the mass data 
informations can be initiated by the Multimedia Mediator and are performed 
by the parallel server. 
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Fig. 5. The High Performance Query Server 



Search-Processor: Methods that are frequently used are supported by these 
specially developed hardware, which is integrated into the parallel server as 
a co-processor connected via standard PCI interfaces. 

An example domain is selected to demonstrate the performance of the HPQS. 
This special domain can in principal be replaced by every other domain. Wit- 
hin the selected domain of meteorological data, questions on meteorologic data 
can be submitted to the HPQS. These questions are handled by the different 
modules of the HPQS and initiate search operations on the mass data stored 
on the parallel server. Using the results of these search operations, an answer is 
generated. 

In this way, questions can be answered that have a structure like: 

— Where was the most warm place in Germany yesterday 7 

— How many days were sunny in Berlin in the last month 7 

— Show me pictures of the formation of clouds over Bavaria in the first week 
of August ! 



1.5 Model 

In this section we describe the model of an ICMS that is used in our study. To do 
this, we mainly study the storage and operational model as the hardware model 
is similar to that of the parallel web server discussed in the previous section. 



Storage Model. The main task of an ICMS is to store and deliver continuous 
media information. The information has usually large bandwidth (e.g. video 
encoded with 25 frames per second) and is therefore encoded using sophisticated 
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encoding methods. Popular encoding standards for audio and video data have 
been proposed by the “Motion Picture Expert Group (MPEG)” |S]. The MPEG- 
encoded audio/video stream is usually partitioned in large packets and mapped 
onto the storage devices. The partitioning of the MPEG stream is done at the 
time when the stream is loaded onto the server system. We assume that all 
packets have the same size and can be delivered independently from the server 
to the client. 

The mapping of the data packets onto the storage devices is determined by 
the data layout, i.e. a function that maps a set of media packets onto a set of 
storage devices. Popular data layouts are a random layout, i.e. data packets are 
mapped with a uniform random distribution, or the linear striping method. For 
the linear striping, the Tth data packet of a stream is mapped to storage device 
Tr(i) and the i + 1-st data packet is mapped onto storage device nli) + 1 mod n 
if n is the number of storage devices (which are assumed to have all the same 
storage capacity) and are numbered from 0, to n—1. Thus, for the linear striping 
of a stream, the mapping of all data packets is determined by that of the first 
packet 7t(1). For a stream s that is build up by m packets (s = (p®, . . . ,p^)) and 
is striped onto n disks, we call map(s) := 7r(l) as the start disk of the stream. 



Operational Model of an ICMS. From the user’s point of view an IGMS 
behaves like a VGR. If the user logs into the IGMS he selects an audio/video 
stream and is allowed to play, pause, and stop the stream. Before starting to 
play a stream, the admission control algorithm verifies if the IGMS still provides 
sufficient resources (disk bandwidth, communication bandwidth, buffer space, 
processing power) to handle the delivery of the stream. If this procedure is 
successfully passed and the user is playing a stream, data packets are delivered 
from the IGMS to the user client according to the defined bitrate of the stream 
(which is determined when the stream is generated and which is recognized by 
the IGMS when the stream is loaded onto the IGMS). 

The delivery of the data packets from the storage devices to the external 
communication devices is controlled by a scheduler which triggers the delivery 
of the data packets to the external communication device according to the real 
time requirements of the stream. A popular scheduling algorithm uses the simple 
but effective earliest deadline first method, in which a priority queue of events 
is handled that represent time stamps for the delivery of data packets from 
the storage devices to the external network interfaces. Whenever the time has 
expired, the scheduler sends a trigger message to the processor holding the ap- 
propriate data packet and informs the processor to deliver it to the appropriate 
external communication device that connects to the user client. Gompared to 
the communication that takes place from the storage nodes to the nodes holding 
the external communication devices, the communication that is induced by the 
trigger message can be neglected. 

The main task of the scheduling algorithm is now to assure that the buffer at 
the user client that holds some data packets to tolerate network latencies, will 
never become empty. To assure this, we assume that the delay that is induced 
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by the external network connecting the external network interface to the user 
client is constant. So the task of the scheduling algorithm is to determine the 
delay that is induced by the internal communication network of the ICMS and 
by the parallel access of many streams to the same storage device. 

1.6 Communication Scheduling in the Benes and Clos Network 

In this section we will investigate how to construct a parallel ICMS that is 
able to guarantee the communication capability necessary for an ICMS. As a 
building block for this paralle ICMS we take a Butterfly network BF{2, n), i.e. a 
Butterfly network of degree 2 and dimension n connecting 2" storage nodes with 
2” nodes that hold external communication interfaces. A Benes network (B(2, 
r) is constructed by two Butterfly networks of the same dimension (BF(2, r)) 
that are connected back-to-back. For a detailled discussion of both networks see 
15]. For the communication scheduling in the Benes network we use a classical 
results taken from graph theory: 

Theorem 1. (Permutation Routing in Benes networks) 

Given any one-to-one mapping tt of the input links to the 2’’“'"^ 

output links of a Benes network B(2, r) of dimension r, there is a set of 
edge-disjoint paths from the inputs to the outputs connecting input i to 
output 7t(j) for 0 < i < 2''“'"^ — 1. 

The result of this Theorem enables us to construct an ICMS using a B{2,r) 
that works as follows: 

— Connect a storage device to each of the 2'’+^ input links and an external 
communication device to each of the 2’'+^ outgoing links of the Benes net- 
work. 

— Suppose a set of streams {si,...,st} has to be mapped onto the ICMS. 
All streams are linearly striped onto the storage devices and map{si) = 0, 
map{si+i) = {map{si) -\- m) mod 2’’+^ if m is equal to the number of packets 
of stream i. 

— Operate the ICMS in a synchronized way (rounds) using circuit switching 
routing, i.e. in each round a storage device can submit a data packet without 
any congestion to an outgoing link (external communication device). 

— In round 1 of the ICMS the idential permutation is routed, i.e. a data packet 
that is originated at input link j can be submitted to outgoing link j, 0 < 

j < 2’'+! - 1. 

— In round i of the ICMS a data packet at input link j can be submitted to 
outgoing link out{j,i) if (out{j,i) -\- i) mod 2'’+^ = j. 

— If a user client is connected to the external communication device k and 
enters the system to retrieve a stream s that starts at storage device j, i.e. 
the first data packet is stored on storage device j, the set up of the stream 
is delayed until a round i with out{j, i) = k. From this round on, the stream 
is continuously submitted to the user client. 
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Figure El shows the communication lines that are scheduled in the different 
rounds. As the theorem states, the communication can be routed without any 
congestion in each round. Thus, the QoS - requirements of an ICMS can be 
guaranteed. 



Input links from storage devices 

0 1 23 45 67 0 1 23 45 67 





Output links to external communication devices 




Fig. 6. Synchronized operation of an ICMS build up by a Benes network B{2, 3) 



One should notice, that the number of data packets that have to be routed 
in one round over one path from the input links to the output links of the 
Benes network depends directly on the number of users that are located on 
the respective output link, i.e. if one external network interface connects to two 
clients, two data packets are routed in one round of the scheduler from the 
input channel to the output channel. The number of users that are allocated to 
a single output channel of the Benes network is determined by the admission 
control algorithm that will be desrcribed in the next section. 

An ICMS that is constructed and operated in the way described above has 
a number of very important advantages but also some disadvantages. 

The most important advantage is, that in each round all communication lines 
are busy (if all external communication interfaces are used by at least one user), 
i.e. no hardware in terms of switches and wires is wasted. The routing algorithm 
is very simple [S| and can be computed online. Thus, the basic principle is ideally 
suited for the construction of large scale ICMS. 

The disadvantages of the network and the operational principle of the ICMS 
as discussed above are: 
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— The number of switches that are used for the realization of a Benes network 
is very large. In fact the B{2, r) uses (2r+l) 2’’ nodes to connect 2’'+^ storage 
nodes and 2’'+^ external communication devices. 

— The network is not scaleable in the way, that the number of storage nodes can 
be increased without increasing the number of nodes that can hold external 
communication devices. In fact the outgoing links of the Benes network can 
only be used for external communication devices, not to connect additional 
storage subsystems. 

— The overall communication is synchronized, i.e. there might be a decrease 
in the performance as additional effort (in terms of hardware of control 
software) is necessary to handle the overall synchronization. 

In the following we will stepwise refine the basic ICMS in order to end up 
with a final version of the structure and operational model that resolves these 
drawbacks. 



Circuit Switching Routing - Store and Forward Routing. The routing 
model that was described above assumes a circuit switching routing algorithm 
for the delivery of data packets from the storage devices to the external com- 
munication interfaces. As circuit switching models are not very well suited for 
large scale distributed memory parallel computer systems a store and forward 
routing scheme would be favorable. 

In fact it is also possible to use the same routing method in the store and 
forward mode. This is because of the leveled structure of the Benes network. 
Thus if the complete data packets are send in store and forward modus, the 
different paths (connecting input and output links of the Benes network) can be 
routed in an edge - disjoint way. 



Optimization of the Network Structure - Clos Networks. As discussed 
above, the major disadvantage of the Benes network if used as the architecture 
for an ICMS is the fact that the number of input and output edges is similar, 
what means that one external network interface unit has to be used for one 
storage device (otherwise the output link is wasted). 

A possible solution to overcome this problem is use the so called folded Benes 
network or Clos network Pj. This is constructed by mapping level i and r — i 
of a network of dimension r on another. In this way, the communication links 
become bidirectional and the nodes at level 0 provide 2 input and output links 
each. 

The major advantage of this network is, that the number of storage devices 
and external communication devices can now be scaled to any extend, i.e. net- 
work switches are not wasted if only a relative small number of external network 
devices are used compared to the number of storage devices (as it is usually the 
case for typical ICMS installations). On the other side, the routing algorithm 
can be performed in the same way as described for the Benes network using the 
bidirectional communication links of the folded network. 



164 R. Liiling, F.C. Gomez, and N. Sensen 



1.7 A Parallel ICMS Complying to the RTSP Pprotocol 

Based on the data layout and scheduling method that have been presented in 
the previous section, a parallel ICMS complying to the RTSP protocol was de- 
veloped. 

RTSP, RTF and RTCP: The ‘Real Time Streaming Protocol (RTSP)” is used 
to control the delivery of continuous media from an ICMS to the clients. The 
associated data delivery protocol RTP (Realtime Transport Protocol) is used 
to encapsulate the media elements delivered by the server. Closeley associated 
to the RTP protocol is the RTCP (Realtime Transport Control Protocol) that 
provides information about packet loss, jitter and other measures to control the 
delivery of the data packets via the IP network from the server to the client. 

The RTSP protocol was jointly developed by Progressive Networks, Netscape 
Communications, and Columbia University to satisfy the needs for an efficient 
delivery of streamed multimedia data over IP networks. Its specification (pro- 
duct of the Multiparty Multimedia Session Control Working Group) has been 
approved by the lESG in February 1998. RTSP has its origin in the well known 
HTTP protocol. Both protocols can be used homogeneously in a common appli- 
cation and are therefore very well suited to integrate the delivery of both media 
types in future client systems. 

The idea of using RTSP and RTP is that the control and delivery of con- 
tinuous media is handled via different channels (IP connections). Whereas the 
control messages (issued via RTSP, i.e. STOP, PLAY, PAUSE, ...) are submit- 
ted via TCP/IP from the client to the server, the data packets are encapsulated 
according to the RTP payload format specification and submitted usually via 
UDP/IP from the server to the client. 

The parallel RTSP server: The idea of independent channels for control and 
delivery of media streams is directly reflected in the design of the software ar- 
chitecture of the RTSP server that is shown in Figure 0 The different modules 
shown in this figure (here a configuration with 3 data retriever processes and 
2 RTP delivery processes is shown) are mapped onto the processors. The data 
retriever processes are performed by those processors that hold a storage de- 
vice. The RTP data sender modules are mapped onto the processors that hold 
an external network interface card. Both types of processes can be scaled. The 
scheduler and other control processes are mapped onto one single processor. 

The RTSP module accepts incoming requests for a new user session and 
forks a new RTSP dealer thread for each session it handles. This dealer thread 
performs the RTSP control communication between the client and the server for 
the requested session. On start of the session, the admission control algorithm 
is invoked in order to decide about the acceptance of the session request, taking 
the current resources of the ICMS as well as the necessary resources (bitrate, 
memory consumption) of the requested stream into account. If the session is 
admitted, one of the RTP data sender modules is identified that provides enough 
resources to handle the delivery of the stream and is located on a processor that 
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RTSP Commands MPEG data encapsulated in RTP 

from Client on top of UDP/IP to client 



Fig. 7. Software architecture of parallel RTSP server 



has an external network interface that is able to access the client. From then on, 
the scheduler processes takes over the responsibility to trigger the data retriever 
processes for the delivery of media packets from the disks to the appropriate RTP 
data sender who encapsulates the data packets for delivery over the network to 
the client. 

For the RTSP server presented here, we used a Clos network as the structure 
for the internal communication network. In this way it is possible to apply the 
previous results for the asynchronous scheduling, admission control and data 
layout to the RTSP server. 

Integration of the RTSP server: The RTSP server is used in some European 
research and development projects. The SIGMA project (SIGMA = Scaleable 
Interactive Gontinuous Media Server - Design and Application) uses the parallel 
RTSP server for a teleteaching application at the Limburg University in Belgium. 
Student from different departments can access lectures that contain audio/video 
material (about 70 GByte of MPEG-1 and MPEG-2 encoded data) from about 
45 terminals that are located in the University. At the same time, students can 
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access the lecture material from their residential homes using ADSL modems 
that are able to transport about 6 MBits/sec via telephone lines from the server 
to the local PC. 

Additionally, the parallel RTSP server is used in the EPRI-COM project that 
aims at providing high performance audio/ video services to parliamentarians of 
the European Parliament and of some national European parliaments. The idea 
of this project is to build up an information service for European parliaments 
that integrates text, graphics, pictures as well as audio and video sequences. The 
material is collected from debates in the European Parliament and customised by 
different companies. The parallel RTSP server is used to store all this information 
and deliver it to the connected parliaments. As it is not feasible to deliver the 
content from one central server to all European parliaments the idea is to mirror 
the content from a larger server installed in Brussels to a number of smaller RTSP 
servers installed in each parliament. From there on, the audio/video streams are 
directly delivered to the parliamentarians that are connected to the local server. 
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Abstract. The area of broadband communication networks gives rise to 
a large number of on-line problems. One of the most extensively studied 
problem in this area is the on-line processing of calls; two classes of 
problems have been considered: call control and load balancing. In the 
first case a sequence of requests for calls is given on-line to an algorithm 
which can be either accepted or rejected; the algorithm has to select a 
virtual circuit between the communicating parties of an accepted call, 
obeying the network constraints, with the goal of maximizing the total 
benfit of accepted calls. In load balancing the goal of the algorithm is 
to find virtual circuits for all calls that minimize the use of network 
resources. 

Algorithms for on-line problems are usually analysed in terms of their 
competitive ratio, i.e., the worst case, over all input sequences, of the 
ratio between the values of the solution found by an optimal off-line 
algorithm (that knows the whole sequence in advance) and by the on- 
line algorithm. 

In this talk we review the main results that have been proposed in the 
literature by presenting both deterministic and randomized algorithms 
for various kind of network topologies and discussing lower bounds. 
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Abstract. Recent time has seen quite some progress in the development 
of exponential time algorithms for AP-hard problems, where the base of 
the exponential term is fairly small. These developments are also tightly 
related to the theory of fixed parameter tractability. In this incomplete 
survey, we explain some basic techniques in the design of efficient fixed 
parameter algorithms, discuss deficiencies of parameterized complexity 
theory, and try to point out some future research challenges. The focus of 
this paper is on the design of efficient algorithms and not on a structural 
theory of parameterized complexity. Moreover, our emphasis will be laid 
on two exemplifying issues: Vertex Cover and MaxSat problems. 



1 Introduction 

How to cope with intractability? This is one of the most important problems 
in the theory and practice of computer science. Several methods to deal with 
this problem have been developed: approximation, average case analysis, rando- 
mization, and heuristics. All of them have their drawbacks as there are hardness 
of approximability, lack of mathematical tools and results, limited power of the 
method itself, or the lack of provable performance guarantees at all. Paramete- 
rization, whose cantus firmus can be characterized by the words “not all forms 
of intractability are created equal” I2H, is another proposal how to cope with 
intractability in some cases. This is the basic subject of this paper. 

Many hard computational problems have the following general form: given an 
object X and a natural number k, does x have some property that depends on fc? 
For instance, the AP-complete Vertex Cover problem is: given an undirected 
graph G = (V,E) and a natural number fc, does G have a vertex cover of size 
at most fc? Herein, a vertex cover is a subset of vertices G C V such that each 
edge in E has at least one of its endpoints in G. In parameterized complexity 
theory, this natural number k is called the parameter. In many applications, the 
parameter k can be considered to be “very small” in comparison with the size 

* Supported by a Feodor Lynen fellowship of the Alexander von Humboldt-Stiftung, 
Bonn, and the Center for Discrete Mathematics, Theoretical Computer Science and 
Applications (DIMATIA), Prague. Author’s address in 1998: DIMATIA MFF UK, 
Charles University, Malostranske namestf 25, 118 00 Praha 1, Czech Republic. 
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|a:| of the given object x. Hence, it may be of high interest to ask whether that 
problems, which usually are A^P-hard, have deterministic algorithms that only 
are exponential with respect to k, but polynomial with respect to \x\. 

Parameterized complexity, as mainly developed by Downey and Fellows [1, 
17-21] is the perhaps latest approach to attack problems that are (worst case) 
intractable. The basic observation is that for many hard problems the seemingly 
inherent “combinatorial explosion” can be restrained to a “small part” of the 
input, the parameter. So, for instance, the AP-complete Vertex Cover problem 
allows for an algorithm with running time 0{kn+ (1.3248)^fc^) |5], where the 
parameter fc is a bound on the maximum size of the vertex cover set we are 
looking for and n is the number of vertices of the given graph. The fundamental 
assumption is k n. As can easily be seen, this yields an efficient, practical 
algorithm if only small values of k are involved. In this paper, we focus on issues 
concerning the development of efficient fixed parameter algorithms. However, 
there are also tight relations to the somewhat more general problem of designing 
exponential time algorithms with “small” exponential terms. 

The writing of this paper was stimulated by the following conception: It is 
widely agreed that the notion of P versus NP reasonably reflects the difference 
between tractable and intractable problems. Why? Does an algorithm with run- 
ning time putting the corresponding problem into P, have practical use? 
In general, no. The general observation, however, is that most problems in P in 
fact have 0{n^) algorithms or better ^2], which is not that enormous. Of course, 
from a practical point of view, this may still be unacceptable and usually the ul- 
timate goal are linear or quasilinear time algorithms with small constant factors. 
For parameterized complexity, expressed conservatively, such an observation is 
hard to make. Problems are called fixed parameter tractable if they have running 
time f{k)n^^^'^ for an arbitrary function / only depending on k. Unfortunately, 
this f{k) usually cannot be bounded so nicely as in the case of Vertex Cover 
(where /(fc) = (1.3248)^ 00 ), but grows much faster (e.g., still giving a harmless 
example, f{k) = 11^ for the Planar Dominating Set problem EDI), making the 
fixed parameter tractable algorithm already impractical for small values of k. 
This might be one of the, so far, main deficiencies of parameterized complexity 
theory. Here, we will survey and explore some results directed to “efficient” fi- 
xed parameter tractability as represented by Vertex Cover. In particular, our 
main focus is on two elementary techniques used in the design of efficient fixed 
parameter algorithms: kernelization and bounded search trees. 

We assume the reader to be familiar with basic notions from algorithms and 
complexity as, e.g., provided by the text books I13I^VI41I44I . We omit material 
on graph minors, bounded treewidth algorithmics etc., which, on the one hand, 
play an important role in fixed parameter tractability theory, but, on the other 
hand, play a minor (sic!) role for the restricted point of view we are taking 
here — elementary methods in designing efficient fixed parameter algorithms. 

^ Note that, according to the above, we actually have running time 0{kn + 
(1.3248)*^fc^). Assuming, however, fc <C n it is easy to see that this means also a 
bound of the form 0(f{k)n^^^^). 
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Let us mention in passing, however, that for graph problems graph minor theory 
is one of the main tools for showing fixed parameter tractability ITTEKI : If a 
graph class is minor closed, then this implies fixed parameter tractability of 
the corresponding problem. For instance, consider the class of graphs having a 
vertex cover of size at most k, which is closed under taking minors. Consequently, 
graph minor theory tells us that the problem is fixed parameter tractable. In 
addition, in the context of bounded treewidth there has been proposed a “design 
methodology that for many iVP-hard problems results in algorithms with time 
complexity linear in the size of the input graph and only exponential in its 
treewidth, lowering the exponent of previously known solutions” (S3- Finally, 
let us mention the existence of a further general FPT method that uses hashing 
and is called “color-coding,” developed by Alon et al. ^j. 

The paper is structured as follows. In the next section, we very briefly provide 
a general overview on some main topics and ideas of parameterized complexity 
theory. In Section 0, we take a closer look at the concept of “fixed parameter 
tractability” and its criticism, thus providing the basic motivation for this paper. 
Turning to the main approach of theoretical computer science in dealing with in- 
tractability, that is, approximation, in Section 0 we sketch some known relations 
between approximation algorithms and parameterized complexity. In Section 0 
based on the Vertex Cover problem, we explain the two basic techniques, ker- 
nelization and search trees. We present the basic ideas behind the best known 
Vertex Cover problem and also discuss related approaches in solving important 
problems from reconfigurable VLSI. In Section 0 we also discuss efficient fixed 
parameter algorithms for the maximum satisfiability problem. We end the paper 
by drawing some general conclusions. 

2 A Crash Course in Parameterized Complexity 

Given an undirected graph G = (V,E) with vertex set V and edge set E and 
a natural number k, the AP-complete Vertex Cover problem is to determine 
whether there is a subset of vertices C CV with k or fewer vertices (where k is 
a given natural number) such that each edge in E has at least one of its endpoints 
in C. Vertex Cover is fixed parameter traetable: There is an algorithm solving 
it in time 0{kn+ (1.3248)^fc^) |5j, making it efficiently solvable for reasonably 
small values of k. By way of contrast, consider the also AP-complete Clique 
problem: Given an undirected graph G = (V,E), Clique asks whether there is 
a subset of vertices C C V with k or fewer vertices (where k is a given natural 
number) such that C forms a clique by having all possible edges between the 
vertices in C. Clique appears to be fixed parameter intractable: It is not known 
whether it can be solved in time f{k)n^^^\ where / might be an arbitrarily fast 
growing function only depending on k HH|. Moreover, unless P = NP, the well- 
founded conjecture is that no such algorithm exists. Therefore, the best known 
algorithm solving Clique runs in time 0(n“^/^) |32|, where c is the exponent 
on the time bound for multiplying two integer n x n matrices (currently best 
known, c = 2.376 . . . , see m)- Note that is trivial. The decisive point is 
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that k appears in the exponent of n, and there seems to be no way “to shift the 
combinatorial explosion only into k" , independent from n m 

The observation that iVP-complete problems like Vertex Cover and Clique 
behave completely differently in a “parameterized sense” lies at the very heart 
of parameterized complexity, which was pioneered by Downey and Fellows and 
some of their co-authors piiSI2()l21l22i47| . In this paper, we will focus on the world 
of fixed parameter tractable problems as, e.g., exhibited by Vertex Cover. Hence, 
here we only briefly sketch some very basics from the theory of parameterized 
intractability in order to provide some background on parameterized complexity 
theory and the ideas behind. For any further details and more discussion, we 
refer to the extensive literature, e.g., im2W‘2H7j . 

Attempts to prove nontrivial, absolute lower bounds on the computational 
complexity of problems have made relatively little progress |2|. Hence, it is not 
surprising that up to now there is no proof that no f{k)nP^^'> time algorithm 
for Clique exists. In a more complexity-theoretic language, where the class of 
parameterized problems that can be solved in deterministic time f{k)n^^^^ is 
called FPT, this can be rephrased by saying that it is unknown whether Clique 
G FPT. The complexity class FPT is called the set of fixed parameter traetable 
problems. Analogously to classical complexity theory, Downey and Fellows de- 
veloped some way out of this quandary by providing a completeness program. 
However, the completeness theory of parameterized intractability involves signi- 
ficantly more technical effort. We briefly sketch some integral parts of this theory 
in the following. 

To start with a completeness theory, we first need a reducibility concept: 
Let L,L' C E* X N be two parameterized languages. For example, in the case 
of Clique the first component is the input graph coded over some alphabet S 
and the second component is the natural number k, that is, the parameter. For 
complexity theory people, we mention in passing that the parameter k usually is 
encoded in unary as part of the input. We say that L reduees to L' hy a standard 
parameterized m-reduetion if there are functions k ^ k' and k i— > k” from N to 
N and a function {x,k) x' from S* x N to V* such that 

1. {x, k) I— > x' is computable in time A:"|a;|° for some constant c and 

2. (x, k) € L iff (x', k') G L' . 

Notably, most reductions from classical complexity turn out not to be parame- 
terized m- For instance, the reduction from Independent Set to Vertex Cover 
(see |15) is not a parameterized one. On the other hand, the reduction from 
Independent Set to Clique actually turns out to be also a parameterized one. 

Now, the “lowest class of parameterized intractability”, so-called kF[l], can 
be defined as the class of languages that reduce by a standard parameterized 
m-reduction to Clique. Hence, Clique is W[l]-complete. Independent Set is also 
W[l]-complete. A further, interesting IF[l]-complete problem is Weighted q- 
CNF-Sat: Given a boolean formula F in conjunctive normal form and a positive 
integer fc, does F have a truth assignment of weight kl Herein, the weight of 
a truth assignment simply is the number t of variables set true. Downey and 
Fellows provide an extensive list of many more kF[l]-complete problems ( I ?SI2 1 1 . 
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As a matter of fact, a whole hierarchy of parameterized intractability can be 
defined, W[l] only being the lowest level. In general, the classes W[t] are defined 
based on “logical depth” (i.e., the number of alternations between unbounded 
fan-in And- and Or-gates) in boolean circuits. We omit any further details in 
this direction and just refer to the new monograph m or the many papers 
published on this topic, e.g., There exists a very rich structural 

theory of parameterized complexity, somewhat similar to classical complexity. 
Observe, however, that in some respects parameterized complexity appears to 
be in a sense “orthogonal” to classical complexity: For example, the so-called 
problem of computing the V-C dimension from learning theory |7Enj . which is 
not known (and not believed) to be AP-hard, is VF[l]-complete [1 til2()j . Thus, 
although in the classical sense it appears to be easier than Vertex Cover (which 
is AP-complete), it appears to be exactly vice versa in the parameterized sense, 
because Vertex Cover is in FPT . 

From a practical point of view, it is probably sufficient to distinguish bet- 
ween VF[1] -hardness and membership in FPT. So, not being able to show fixed- 
parameter tractability of a problem, it may be sufficient to give a reduction 
from Clique or Weighted g-CNF-Sat to the given problem, using a standard 
parameterized m-reduction. This then gives a concrete indication that, unless 
P = NP, the problem is unlikely to allow for an f{k)'nP^^'> time algorithm. One 
circumstantial evidence for this is the result showing that the equality of W[l] 
and FPT would imply a time algorithm for the AP-complete 3-CNF-Sat 
problem PEU, which would mean a breakthrough in computational complexity 
theory. 

In the remainder of this paper, however, we concentrate on the world inside 
FPT and the potential it carries for improvements and future research. Lots of 
problems termed fixed parameter tractable by the theory still wait for a proof 
of real “parameterized efficiency.” There seem to be plenty of fields like compu- 
tational biology or VLSI design, offering natural parameterized problems with 
efficient fixed parameter algorithms still to be discovered (also see |2 1 122f47j i. 

3 On the Meaning of Fixed Parameter Tractability 

Vertex Cover has an 0{kn + f{k)) algorithm, where /(fc) = 0((1.3248)^fc^) p|. 
So, even for values like k = 70, this still makes an efficient algorithm, giving this 
result potential for practical importance. On the other hand, in the definition of 
FPT, f{k) may take unreasonably large values, e.g.. 




Even a less enormous value like f{k) = 11^ for the Planar Dominating Set 
problem mi only provides efficient algorithms for quite small values of k. Downey 
and Fellows mi introduced so-called klam values to address this. The klam value 
of an algorithm A solving a problem L is defined to be the largest k such that 
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Table 1. Comparing the efficiency of various MaxSat algorithms with respect to the 
exponential terms involved. 



k 


22k 


(1.6181)'= 


(1.3995)'= 


10 


« 10® 


« 124 


29 


20 


« 10^^ 


« 15140 


831 


30 


«10i® 


« 1.9 • 10® 


« 24000 


40 


«10"" 


« 2.3 • 10® 


«6.9- 10® 


50 


« 10®“ 


« 2.9 • 10®“ 


« 2.0 • 10'^ 


60 


« 10®® 


« 3.5- 10®® 


« 5.8 • 10® 



1. L can be solved by A in time f{k) + and 

2- f{k) < U, where U is some reasonable absolute bound on the maximum 
number of steps of any computation, e.g., U = 10^°. 

For example, using U = 10^*^, the current klam value for Vertex Cover is ap- 
proximately 165. Unfortunately, for few parameterized problems klam values of 
comparable high quality are known. Hence, an important algorithmic challenge 
concerning FPT problems is to provide klam values as large as possible. 

To further substantiate the discussion before, let us briefly address another 
parameterized problem, namely maximum satisfiability for a boolean formula in 
conjunctive normal form (CNF) with a constant number of literals per clause 
(also see Section 0 for a more complete treatment). Here, for some time the 
best known algorithm had running time where m is the number of 

clauses mi- This, assuming a constant number of literals per clause, was first 
improved to 0{m + k(jA) « 0(m -I- fc(l. 6181)^), where (j) is the golden ratio pTij . 
and very recently was further improved to 0{m + fc(1.3995)^) Note that 
for the improvements it is not even necessary to assume a constant number of 
literals per clause. Then, however, the term m in the time bound has to be 
replaced by the formula length |F| and the multiplicative factor k has to be 
replaced by k^. Let us compare the exponential expressions involved in these 
three time bounds. Table 0 provides these bounds for some reasonable values 
of k, implying that the klam value increases significantly and emphasizing the 
importance of the struggle to make the base of the exponential term as small as 
possible. So, the klam value for MaxSat corresponding to the three exponential 
algorithms referred to in Table 0 improves from approximately 35 to 100 to 140. 

Finally, to also demonstrate the problematic nature of the comparison “fixed 
parameter tractable” versus “fixed parameter intractable” , let us compare the 
functions 2^ and The first refers to fixed parameter tractability, 

the second to intractability. It is easy to verify that assuming input sizes n in the 
range from 10^ up to 10^®, the value of k where 2^^ starts to exceed is in the 
small range {6, 7, 8, 9}. Hence, this shows how careful one has to be with the 
term fixed parameter tractable, since, in practice with reasonable input sizes. 
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a fixed parameter intractable problem can easily turn out to have a still more 
efficient solution than a fixed parameter tractable one. A striking example in 
this direction is that of computing treewidth. For constant fc, there is a famous 
result giving a linear time algorithm to compute whether a graph has treewidth 
at most k |^. However, this algorithm suffers from enormous constant factors 
(unless fc < 3) and so the algorithm 0 is more practical. 



4 Approximation and Parameterization 



In this section, we discuss some relations of fixed parameter (in)tractability to ap- 
proximation. Polynomial time approximation algorithms and schemes are one of 
the major methods to cope with intractability izq. Two recent surveys are avai- 
lable USES]. Recently, deep results based on probabilistically checkable proofs 
have shown that many AP-hard optimization problems are also hard to approxi- 
mate 0 . This gives rise to the general question of the nature of the relationship 
between fixed parameter tractability and approximability of problems. In this 
section, we will sketch some of the known results concerning this relationship 
and discuss implied consequences. 

Results on the relationship between parameterized complexity and appro- 
ximation come up in at least two ways — a more structural one and a more 
algorithmic one. We briefly study both of them, but afterwards direct our at- 
tention to the more algorithmic nature. Since this short section is anything but 
complete, for a more comprehensive treatment we refer to the literature [mai- 

Optimization problems come in two forms: maximization and minimization 
problems. We concentrate on maximization problems, the minimization case 
works in analogy. A maximization problem is a 3-tuple (I,S,g), where I is the 
set of input instances, S{x) is the set of feasible solutions for input x G I, and 
g{x,y) G N is the value for each x G I and y G S{x). The goal is to maximize 
g{x,y). The parameterized version of a maximization problem is: given x G I 
and a positive integer A:, is there & y G S{x) such that g{x,y) > k. 

A maximization problem is polynomial time approximable to a ratio r if there 
is a polynomial time algorithm such that for all input instances x G I it produces 
& y G S{x) such that for the relative error it holds 



max(x) 

g{x,y) 



< 1 -I- r. 



where max(x) denotes the maximum value of the input instance x. A maximiza- 
tion problem has a polynomial time approximation scheme (PTAS) if for all e > 0 
there is a polynomial time algorithm that produces a ratio e approximation. Fur- 
thermore, it has a fully polynomial time approximation scheme (FPTAS) if it 
has a PTAS where, additionally, the running time of the algorithm is polynomial 
in the input size as well as in 1/e. 

It is not very difficult to prove the following interesting result HH: If an 
NP optimization problem has an FPTAS, then it is in FPT. The basic idea of 
proof is to make use of the fact that if max(a;) /g{x, y) < 1 -I- l/(2fc), this implies 
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max(a;) > k iff g{x, y) > k. The contrapositive consequences of this result appear 
to be still more important. As a corollary we get that the NP optimization pro- 
blems that are VP[l]-hard under the standard parameterized m-reduction have 
no FPTAS unless 1T[1] = FPT [TTI. Thus, the structural theory concerning the 
IT-hierarchy surprisingly may give evidence on the non-approximability of op- 
timization problems. In other words, proving lT[l]-hardness can be seen as one 
way to show non-approximability. Further results of Cai and Chen show that the 
parameterized versions of all maximization problems in the class MaxSNP, intro- 
duced by Papadimitriou and Yannakakis PS], and all minimization problems in 
the class MinF^IIi , introduced by Kolaitis and Thakur are in FPT. Hence, 
besides the above-mentioned, more structural issues, the subsequent questions 
arise: 

1. Which problems in MaxSNP and MinF^ IIi admit efficient fixed parameter 
algorithms? What are the best time bounds? 

2. More generally, can ideas from approximation algorithms be used for the 
design of efficient fixed parameter algorithms and vice versa? 

3. For optimization problems a compendium of approximability results exists 
M- Will the future see something analogous for efficient fixed parameter 
tractability, giving the best achieved exponential time algorithms? 

5 Vertex Cover Problems 

The minimization problem Vertex Cover is surely one of the best explored pa- 
rameterized problems. The problem instance is an undirected graph G = {V,E) 
and a positive integer fc, the question is whether there exists a “vertex cover set” 
C CV with |C| < A: such that for all edges (u,v) in E, it holds that u G C or 
V G C. Vertex Cover, sometimes called Node Cover, is VP-complete. A straight- 
forward greedy algorithm shows that Vertex Cover is approximable to a ratio 1 
(cf. m), that is, the greedy algorithm always finds a vertex cover of size at most 
twice as large as as the optimal one. The simple idea behind the greedy algo- 
rithm is to pick any edge from the graph, put both endpoints in the vertex cover, 
and delete these endpoints together with their incident edges from the graph. 
However, unless P = VP, Vertex Cover has no polynomial time approximation 
scheme P] and it is known to be not approximable to a ratio 0.1666 m 

Although Vertex Cover is hard to approximate, it has turned out that it is 
“easy to parameterize”: Vertex Cover has seen quite some history of progress 
with respect to fixed parameter algorithms (see |25 for details) . One of the first 
results (of mainly theoretical interest) showing its fixed parameter tractability 
was based on Robertson and Seymour’s deep theory of graph minors lea- 
ding to an O(n^) algorithm for constant k. Even a linear time algorithm followed, 
because graphs with “bounded vertex cover” have bounded treewidth Ho- 
wever, more efficient algorithms based on techniques as bounded search tree pni 
and reduction to problem kernel pni have been obtained. Using maximum mat- 
ching as a subroutine, Papadimitriou and Yannakakis m showed that Vertex 
Cover admits a polynomial time solution whenever the cover size is O(logn). 
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Surprisingly, in essence all this already follows from the elementary search tree 
method described in Mehlhorn’s text book on graph algorithms [21 page 216], 
published before all of the above-mentioned papers. Recently, Balasubramanian 
et al. came up with a greatly improved fixed parameter algorithm for Ver- 
tex Cover, running in time 0{kn+ (1.324718)*^fc^). They employ an intricate, 
improved search tree algorithm. Very recently, this result was slightly impro- 
ved to 0{kn + (1.31951)^A:^) |21|. Note that according to the authors this “tiny 
difference amounts to a 21% improvement in the running time for k = 60.” 

In the following subsection, we describe the basic ideas behind the algorithm 
of Balasubramanian et al. The further improvement was achieved using simi- 
lar ideas. In particular, studying this concrete problem, the purpose is also to 
become familiar with the two so far perhaps most successful techniques in desi- 
gning efficient fixed parameter algorithms — bounded search tree and reduction 
to problem kernel. Afterwards, in Subsection 16.21 we give one example how to 
apply this methodology to a problem originating from reconfigurable VLSI — 
Constraint Bipartite Vertex Cover. This may give sufficient stimulus to pursue 
further research in this direction. 



5.1 General Vertex Cover 

Using an intricate, but elementary algorithmic technique, Balasubramanian et 
al. developed a very efficient fixed parameter algorithm to solve Vertex Cover |3 . 
Let’s see how this basically works. 



Method 1: Reduction to Problem Kernel. The general idea of this method, 
which is fairly generally applicable (not only to vertex cover or graph problems), 
can be expressed as follows. 

1. Reduce the given instance to a new instance whose size is exclusively bounded 
by a function of the parameter k. 

2. Perform exhaustive search in the new instance, usually employing an expo- 
nential time algorithm. 

In this way, we get Buss’ algorithm for Vertex Cover [1 oy/’Op'/’l j . see Fig. Q 
Obviously, the new instance has size bounded by O(fc^). 

The correctness of Buss’ algorithm relies on the idea that “high-degree- 
vertices” , that is, those of degree > fc, must be part of the vertex cover of size < k 
if one exists. It is not difficult to see, using appropriate subalgorithms, that Buss’ 
algorithm has a running time 0{kn+ {2k^)^k^). Although in the parameterized 
world, reduction to problem kernel is usually attributed to Buss [21)12 1 j . basi- 
cally the same technique has been used at least 10 years earlier in VLSI, e.g., 
by Evans m- Reduction to problem kernel is commonly used as some kind of 
preprocessing to a so-called bounded search tree algorithm, which already can 
be found in Mehlhorn’s textbook in page 216]. 
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Input: A graph G = {V, E) and a positive integer k. 

Output: A minimum vertex cover of size at most k if one exists. 

1. a) Let H be the set of vertices in V with degree > fc; 

b) if \H\ > k then “No size < k vertex cover exists”; exit; 

c) Let G' = (V', E') be the graph originating from G by deleting all vertices in H 
and their incident edges; 

d) k' ■.= k- \H\- 

e) Delete all isolated vertices in G'; 

2. if \E'\ > kk' then “No size < k vertex cover exists”; exit; 

3. Perform exhaustive search on G' to End a minimum vertex cover of size k'; 

4. The minimum vertex cover for G is the minimum vertex cover for G' combined 
with H. 



Fig. 1. Buss’ algorithm for Vertex Cover — reduction to problem kernel. 



Method 2: Bounded Search Tree. The general idea of this method is to 
identify a small subset of elements of which at least one must be in any feasible 
solution of the problem. Here comes the application to Vertex Cover based on 
Mehlhorn’s description pn page 216]. See Fig.|21 It is called search tree algorithm 
and is quite similar in spirit to the greedy approximation algorithm for Vertex 
Cover, cf. |11|. We use the notion “G — v” to express that vertex v and all its 
incident edges are deleted from G. 

The time complexity of the algorithm can be easily bounded by 0{2^n). 

Using Buss’ algorithm as preprocessing phase and directly employing the 
search tree algorithm, we obtain a vertex cover algorithm running in time 0{kn+ 
2^/c^). However, combining methods 1 and 2 and improving on method 2 leads 
to the result of Balasubramanian et al. 0, which we will focus on next. 



An Improved Search Tree Algorithm. The key idea of Balasubramanian et 
al. 0 to improve the described search tree method with exponential factor 2*^ 
is to do a careful case distinction by distinguishing between the degree of the 
vertices of the given graph. Observe that factor 2^ actually is the size of the 
search tree. So, the goal is to decrease the size of the search tree by using a 
more sophisticated recursion. Before we describe this in more detail, note that 
by making use of method 1 as a preprocessing phase, w.l.o.g. we can assume 
that the subsequent search tree algorithm only has to operate on input graphs 
of size O(fc^). More precisely, what we do is to run the first two steps of the 
algorithm in Fig. Q and to replace the second two steps by an improved version 
of a bounded search tree (Fig. O)- 

The basic structure of the improved search tree algorithm is as follows: We 
distinguish between five cases in the following order, which is given by the degree 
of the vertices in the graph: First, we deal with “degree- 1- vertices”, second, with 
“degree-2- vertices” , third, with “degree-> 5- vertices”, fourth, with “degree-3- 
vertices”, and, finally, with the remaining graph. Observe that the remaining 
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Input: A graph G = {V, E) and a positive integer k. 

Output: A minimum vertex cover of size at most k if one exists. 

1. Construct a complete binary tree of height fe; 

2. Label the root node (G, 0); 

3. Recursively label all tree nodes as follows, where {H, S) shall be an already labeled 
tree node: 

a) Pick an arbitrary edge (u,v) in graph H; 

b) Label left child of {H, S) with {H — u, S U {«}); 

c) Label right child of (H, S) with {H — v,SVJ {«}); 

4. if there is a tree node labeled (0, S') (0 referring to the “empty graph”) 
then “S'' is a vertex cover of size < fc” 

else “No size < k vertex cover exists”. 



Fig. 2. Search tree algorithm for Vertex Cover. 



graph is 4-regular, that is, each vertex has exactly degree 4. To describe all 
these cases is out of the scope of this paper. To illustrate the fundamental ideas, 
however, it is sufficient to describe the first two (and most simple) ones. 

The degree-l-vertex case is trivial. If a vertex x has only one neighbor y, 
then to cover the edge between them, it is always advantageous to pick y for the 
vertex cover set, because if y has more than one neighbor, we cover more edges 
this way than by choosing x. By always choosing y, a branching of the recursion 
in the search tree can be avoided, implying a decrease of its size. 

The degree- 2-vertex case becomes more involved. We distinguish between 
three subcases. Assume that the considered degree- 2-vertex x has neighbors y 
and z. If there is an edge between y and z, then we avoid any branching of the 
recursion by always choosing y and z to be included into the vertex cover set. 
It is not hard to check that this is always optimal in order to cover the edges 
(x, y), (x, z), and (y, z) and further possibly incident edges to y and z. Subcase 2 
addresses the setting where y and z together have at least two neighbors other 
than X, say a and b. Then with not too much effort (try!) it can be checked that 
either {y, z} or all neighbors of y and z have to be added to the vertex cover 
set. Thus we get a branching of our recursion. Denoting by T(k) the size of the 
recursion tree, this branching leads to the recurrence 

T(fc) = l + T{k-2) + T{k - 3). 

It is important to emphasize here that this is already one of the worst cases for 
the improved search tree, that is, the solution of this recurrence already yields 
the exponential factor (1.3248)* for the tree size as it is part of the overall result. 
Finally, subcase 3 (“otherwise”) deals with the situation when y and z together 
have one neighbor other than x, say a. Then again a branching of the recursion 
can be avoided by the choice {a,x} for the vertex cover set. The optimality of 
this choice is checked easily. 

The complete analysis, involving many more and more complicated cases 
with, however, the same basic flavor, gives an improved search tree of size 
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(1.3248)'= 0. All in all, this results in a running time 0{kn + (1.3248)'=fc^). 
What about the potential for improvement of this result (besides the small one 
to (1.3195)'= already mentioned E3)? K seems to be quite complicated to do 
this due to the great number of case distinctions involved. In particular, there is 
more than one worst case and improving some particular cases may not help in 
bringing down the overall worst case. However, only elementary combinatorial 
considerations have been used for this result and maybe with the help of machine 
support one could still find a better recursion. 



5.2 Constraint Bipartite Vertex Cover 



Kuo and Fuchs studied the Constraint Bipartite Vertex Cover ( CBVC) problem, 
deriving from applications in reconfigurable VLSI m- The problem is, given a 
bipartite graph G = (Vi,V 2 , E) and two positive integers k\ and ^ 2 , are there 
two subsets Cl C Vi and C 2 Q V 2 such that |Ci| < fci and IC 2 I < ^2 and each 
edge from E has at least one endpoint in Ci U C 2 ? In addition, motivated by the 
applications behind, it is interesting to search for all solutions to CBVC with 
minimal values for the vector (1(711,1(721). CBVC is AP-complete in general |2S|- 
Therefore, in practice, heuristic algorithms are used that not always yield optimal 
solutions. Since the parameter values ki and k 2 can be assumed to be quite small 
for technological reasons (say ki + k 2 around 50 all in all), algorithms exponential 
in ki and ^2 may be tolerable as long as the running time is linear in the size of 
the problem instance. 

The affinity between CBVC and Vertex Cover is obvious. However, the 
existence of two parameters in combination with the bipartite nature of the 
graph means a significant hurdle. So, the Vertex Cover algorithm cannot be 
translated into this new setting. However, the basic techniques as reduction 
to problem kernel and bounded search tree again apply. Thus, again based 
on the degree of vertices, the combination of these two techniques yields an 
(7((fciA:2)n+ (1.47)'=i+'=2fciA:2) algorithm |2E!. Here, the case distinction in com- 
parison with the Vertex Cover case gets less complicated and deals as main cases 
with “vertices of degree at least three” and “vertices of degree at most two”. 
However, note that the seemingly trivial case of vertices with degree at most 
two requires some care due to the existence of more than one minimal solution 
(opposite to the general vertex cover case). In particular, this holds true when 
generalizing CBVC from 2 to even 3 parameters, yielding so-called “3CBVC,” 
which is motivated by applications from reconfigurable programmable logic ar- 
rays IS]. The point here is to partition one of the two vertex sets of the given 
graph into two subsets. Nevertheless, solutions of efficiency comparable to CBVC 
can be achieved izg. Besides trying to improve the performance of the proposed 
(3)CBVC algorithms, there appear to be numerous challenges from VLSI design 
concerning efficient parameterized algorithms ES|,e.g., 
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6 Maximum Satisfiability Problems 



Maximum Satisfiability (MaxSat for short) is a problem especially well-known 
from the field of approximation algorithms j1 4i'f ,'f I45|| . having also important 
practical applications m- Hence, many heuristics are in use for MaxSat 0. The 
instance is a boolean formula in conjunctive normal form (CNF), the problem 
is to find a truth assignment that satisfies the most number of clauses. The 
decision version of MaxSat is iVP-complete f27l44| . even if the clauses have at 
most two literals (so-called Max2Sat). One of the major results in theoretical 
computer science in recent time shows that if there is a PTAS for MaxSat, then 
P = NP On the positive side, it is known that MaxSat is approximable 
to a ratio 0.3193 |2H|. For special cases of MaxSat, better bounds are known: 
MaxqSat is approximable to a ratio 2“'*/(l — 2“'^) if every clause contains exactly 
q literals Max3Sat is approximable to a ratio 0.2489 |nH, and Max2Sat 
is approximable to a ratio 0.0741 m- On the negative side, MaxgSat is not 
approximable to a ratio 2“'J/(1 — 2“'^) — e for any e > 0 and q> 3 and Max2Sat 
is not approximable to a ratio 0.0476 m 

The natural parameterized version of MaxSat requires for an algorithm that 
determines whether at least k clauses of a CNF formula F can be satisfied. As- 
sume that F contains m clauses and n variables. For each F there always exists a 
truth assignment satisfying at least [m/2] clauses: simply pick any assignment — 
either it does or its bitwise complement does. This can be checked in time 0(|F"|). 
This observation was used by Cai and Chen CH to prove that parameterized 
MaxqSat for some constant q is in FPT, implying that every problem in the 
optimization class MaxSNP is also in FPT. However, their algorithm relies on 
the boundedness of clauses, which is not necessary for the algorithms described 
in the following. Of course, one might argue that the proposed parameterization 
of MaxSat does not make much sense because for k < [m/2] the problem is 
trivial and for k > [m/2] one usually cannot speak any longer of a “small pa- 
rameter value.” This is also why Mahajan and Raman iOl introduced a more 
meaningful parameterization, asking whether at least [m/2] +k clauses of a CNF 
formula F can be satisfied. However, the first parameterization still remains of 
interest, since from a “non-parameterized point of view,” an algorithm with run- 
ning time exponential in M with a small base for the exponential factor can be 
of interest. So, we firstly stick to this basic parameterization and afterwards very 
briefly deal with the “more meaningful” parameterization. 

Mahajan and Raman m presented an algorithm running in time 0(|A| -|- 
k‘^(f)^) « 0(|A| -I- fc^(1.6181)^) that determines whether at least k clauses of a 
CNF formula F are satisfiable. This algorithm uses a reduction to problem kernel 
as well as a bounded search tree. 

The reduction to the problem kernel relies on the distinction between “large” 
clauses (i.e., clauses containing at least k literals) and “small” clauses (i.e., clau- 
ses containing less than k literals) . If F contains at least k large clauses, then it 
is easy to see that at least k clauses in F can be satisfied. Hence, the subsequent 
search tree method has only to deal with small clauses. Observe that the size of 
the remaining “subformula of small clauses” can easily be bounded by O(fc^). 
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This is also owing to the fact that if the number of clauses in F is at least 2fc, 
then trivially k clauses in F can be satisfied. 

Now the bounded search tree, here more appropriately called branching tree, 
appears as follows. First, note that we can restrict ourselves to only considering 
variables that occur both positively and negatively in F, because so-called “pure 
literals” can always be set true, always increasing the number of satisfied clauses 
without any disadvantage. The basic technique now is to pick one variable x 
occurring both positively and negatively in F and then to “branch” into two 
subformulas and F’[a;], which arise by setting x to true and false. Clearly, 
the size of such a branching tree can easily be bounded by 2^. However, Mahajan 
and Raman m use one further trick, which is basically as follows: Distinguish 
between two cases. First, if the selected variable occurs exactly twice in F, then 
one can do a “resolution” avoiding any branching of the recursion: 

F = {xV fi) A {x\/ f 2 ) A G, 

where G contains no occurrence of variable x, can be replaced by 

F' = (/i V /2) A G, 

knowing that one clause in F could be satisfied. Second, if variable x occurs at 
least three times in F, then we get for the size T{k) of the branching tree the 
Fibonacci recurrence 



T{k) <l + T{k-l) + T{k - 2). 

Altogether, we end up with time complexity 0{\F\+k'^(f>^), where (p = {l + '/b)/2 
is the golden ratio. Independently, in the context of approximation algorithms, 
basically the same result was also achieved by Dantsin et al. m- 

Using many more, carefully designed transformation and splitting rules for 
propositional formulas, the above result could be improved to time complexity 
0(|F|-hfc2(1.39995)'=) EH]- The fundamental ideas, which all deal with decreasing 
the size of the branching tree, are as follows. First, there are several “transfor- 
mation rules”, which avoid any branching of the recursion. Besides the described 
resolution rule, we also have transformation rules called pure literal rule, com- 
plementary unit-clause rule, dominating unit-clause rule, small subformula rule, 
and star rule. Refer to for details. Even more interesting (and more compli- 
cated) are the so-called splitting rules, i.e., those rules that lead to a branching 
of the recursion. The basic idea is to distinguish between the number of occur- 
rences of the variables in formula F. Note that because of some kind of “pre- 
and postprocessing” done by the transformation rules, we always can restrict at- 
tention to variables that occur at least three times in F. Hence, the three main 
cases now are if a variable x occurs at least 5 times in F, if all variables occur 
exactly 4 times in F, or if there is some variable that occurs exactly 3 times in F. 
Obviously, these cover all possibilities. The case of at least 5 variables is quite 
easy and requires a branching into ^[ 0 :] and E[a;]. The other two cases, however, 
are much more complicated and require six, respectively, seven subcases. For the 
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purpose of illustration, we just give one of them, namely, the case that variable x 
occurs exactly three times in F and F is as follows: 

F = {xVlw . . .) A{x\/ IV . . .) A{xV . ..) A{1\/ . . .) A . . . 

That is, there is another literal I occurring together with x in negated form in 
one clause, occurring together with x in positive form in a second clause, and 
occurring in at least one clause not containing variable x. Then the rule “T2” m 
says that we branch (or split) into F[l] and F[l], It is not hard to see that in 
the step from F to F[l] at least two clauses and a further one by applying the 
resolution rule afterwards are satisfied, and in the step from F to F[l] at least 
one clause and two further clauses by applying the pure literal rule applied to x 
are satisfied. 

All in all, it can be shown that all cases lead to a recursion that yields a 
branching tree size that can be bounded by (1.3995)*^. Interestingly, there is a 
slightly “better” result if we measure the complexity not in the parameter k of 
number of clauses to be satisfied, but in the total number m of clauses in F. Then 
the branching tree size can be bounded by (1.3803)"*. Again we refer to PS] for 
any further details. It is worth noting that in particular the development of the 
MaxSat algorithm shows how seemingly tight the relation between fixed para- 
meter tractability and the in some sense more general topic of worst case upper 
bounds for AP-hard problems is. So, both for Vertex Cover and for MaxSat, 
the best known worst case algorithms also yield the best known parameterized 
algorithms. 

We only mention in passing that the parameterization of MaxSat requiring 
the satisfiability of at least |"m/2] -|- k clauses is led back to the case con- 
sidered above by Mahajan and Raman, thus obtaining a time complexity of 
0(|F| -|-A:^(/)®^) R:! 0(1^1-1-^^(17.9443)*). Plugging in the above described impro- 
vements P3, we immediately get 0(|P| -I- fc^(1.3995)®*) ~ 0(|P| -I- fc^(7.5135)*). 
Furthermore, Mahajan and Raman also study the MaxCut problem m- Here, 
analogous parameterizations as for MaxSat exist. The “conventional” parame- 
terization requiring a cut of size at least k can be led back to a Max2Sat pro- 
blem m and thus can also be improved using the above described results. 

7 Conclusion 

The purpose of this paper was not to give a complete survey on how to develop 
algorithms that prove fixed parameter tractability, but, by taking a restricted 
viewpoint, we concentrated on the techniques of reduction to problem kernel 
and, even more importantly, bounded search trees. The main intention of this 
paper was to give an incomplete, but easily understandable review on interesting 
algorithmic aspects of the field and, in particular, to point out the potential for 
future algorithmic research topics in this direction. The ultimate goal of this 
work could be termed as putting the reader into the position to pursue research 
on efficient parameterized algorithms, maybe even without the need of studying 
the extensive literature on parameterized complexity. 
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Also, we concentrated on two core problems: Vertex Cover and MaxSat. 
Besides the open problems in direct relation with these two, we strongly believe 
that there is a big potential for research on efficient parameterized algorithms 
in fields like VLSI design, computational biology, logic, data bases, and several 
others. Practical algorithms from these fields often use heuristic ideas, and may 
yield new insight for parameterized algorithm design. It is also promising to test 
parameterized algorithms in practical applications, e.g., combining or enriching 
them with some heuristics. Research on efficient parameterized algorithms thus 
could mean entering a field that offers a link between computer science theory 
and practice. 
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Abstract. This paper focuses on technical issues regarding the chal- 
lenge of integrating digital libraries in a work support environment that 
combines information management with coordination and collaboration. 
Such broad scope necessitates an extended view of electronic documents 
to allow for combinations of information content together with executa- 
ble components. Moreover, as a result of the autonomy of the service and 
information content providers, this environment takes the form of an in- 
formation economy. This paper provides a brief survey of research results 
related to architectures, metadata, interoperability and rights manage- 
ment in digital library systems. This survey serves to support a discussion 
of extending the scope of digital library systems toward integrated work 
support environments. 



1 Introduction 

Digital libraries extend and augment their physical counterparts by facilitating 
access to existing information resources and services, and by providing enhanced 
support of information-intensive tasks. They offer new levels of access to broader 
user communities, and create novel opportunities for information exchange un- 
derstanding, cooperation and problem solving. The vision of a worldwide digital 
library is actively being pursued by many researchers, developers, and practitio- 
ners. 

As stated in HSI, interoperability is the key challenge in realising this vision, 
requiring “ways to link the diverse content and perspectives provided by indivi- 
dual digital libraries around the world” in a loosely coupled federation of auto- 
nomous systems, each with its own collection management and access policies, 
to serve specialized goals and user populations. It is important to note that in- 
teroperability needs to be addressed at multiple levels of abstraction, including 
technical aspects related to system interconnectivity standards and protocols, 
information-related aspects such as language, metadata, naming, semantics and 
access interfaces, and interaction-related aspects concerning the rights and obli- 
gations of users and organizations. Therefore, digital library systems necessarily 
address a broad range of technical, informational, and interaction issues. 
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This paper focuses on technical issues regarding the challenge of integrating 
digital libraries in a work support environment. We consider a view of digital 
library systems that goes beyond that of a repository of on-line information, to 
provide support throughout all phases of work. m identifies the integration of 
information retrieval systems with other information management systems as 
the most important problem for organizations. makes the point that the 
activities of users involved in information-intensive tasks can be categorized as 
discovery, retrieval, interpretation, management, and sharing. Users need to find, 
analyze and understand information of widely diverse types. Moreover, they need 
to re-organize the information to use it in multiple contexts, and to manipulate 
it in collaboration with others. This is a major departure from assuming that 
the main objective of a digital library system is to locate a particular piece of 
information. This is further discussed in P31, where, in the context of a critical 
examination of assumptions that have historically guided the development of 
networked information systems, it is pointed out that the assumption that “the 
correct answer lies in the information” does not cover the case of correlations 
that require collaborative expertise among individuals interacting with informa- 
tion. Distributed Knowledge Environments are required to provide “seamless 
interoperability among searching, authoring, and collaboration facilities” , advo- 
cating a toolbox metaphor to support users in their information management and 
analysis tasks. 

m argues for a common infrastructure to support both digital libraries and 
electronic commerce. In our view several important classes of large-scale 

distributed applications, such as digital library systems, electronic commerce 
environments and scientific collaborative work environments, share key requi- 
rements with emphasis on combining multiple resources made available under 
restricted terms (and therefore with a limited degree of external control) by in- 
dependent co-operating partners. We aim for a unified treatment of the problem 
of supporting such diverse applications via a set of common middleware services. 
This view fits into the network-centric application paradigm that is becoming 
more and more popular in open and dynamic environments, such as the Internet, 
as a means to utilize widely distributed application components and informa- 
tion resources that are made available by autonomous providers. In such envi- 
ronments, dynamic configuration and composition are key requirements, as the 
basis for both coordination, which entails structured processes involving mainly 
automated activities, and collaboration, which entails mainly unstructured pro- 
cesses with significant human interaction. We are working toward a common 
infrastructure for large-scale distributed applications that enables on-demand 
composition and configuration of networks of components made available by 
autonomous providers, which are coordinated in the context of work sessions. 

The remainder of this paper is organized as follows. Section |21 briefly sur- 
veys recent research results mainly from the areas of interoperability for digital 
library systems and electronic documents with extended functionality. Section 
0 suggests extensions in support of our vision of an integrated work support 
environment. This section also presents examples of value-added services for di- 
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gital information objects. Where appropriate, we illustrate the issues presented 
by referring to our ongoing work toward the development of an infrastructure 
to support large-scale distributed applications. Finally, Section 0 concludes the 
paper by summarizing the ideas presented, and outlines our research plan. 

2 Current State-of-the-Art 

In this section we present a brief survey of research results related mainly to the 
problem of interoperability in digital library systems. The following subsections 
provide concise reports on the current state-of-the-art for frameworks and archi- 
tectures, metadata, interoperability protocols and intellectual property rights. 
Finally, there is a subsection that surveys work related to extended electronic 
document models, which we consider essential for extending the functionality 
of digital library systems. These reports serve to support the discussion in Sec- 
tion 0 of the proposed extension of the scope of digital library systems toward 
integrated work support environments. 

2.1 Frameworks and Architectures 

In a seminal paper m, Kahn and Wilensky proposed a framework for for dis- 
tributed digital object services, which encompasses digital libraries as well as 
numerous other services, such as electronic commerce applications. The paper 
describe the basic entities to be found in such a system, in which information 
in the form of digital objects is stored, accessed, disseminated and managed. A 
digital object is defined to an instance of an abstract data type with two com- 
ponents, data, in the form of typed digital material, and key-metadata which 
includes, as a minimum, a handle that is globally unique to the digital object. 
Repositories store digital objects and are responsible for securing their resident 
objects according to their respective terms and conditions for access and usage, 
which are contained in a properties record for each digital object. A dissemina- 
tion is the result of an access request on a digital object. It contains the results of 
the access request (determined by the parameters in the request) and additional 
components specifying the origin of the dissemination and the specific terms and 
conditions governing its use. The repository access protocol includes services for 
deposit of digital objects and access to digital objects in repositories. 

The Dienst prototype m implements part of the framework proposed in 
m, focusing mainly on the design of a protocol for accessing digital objects 
in repositories. present a system design approach to the framework in 

m, based on a distributed object model. An important observation in PHI 
is that a dissemination is not restricted to have the same data as the source 
digital object. Moreover, a dissemination is not necessarily a subset of the digital 
object’s data. Thus it is possible, for example, for a digital object to be an 
executable program and disseminations to be produced by running the program 
using the parameters in the access request as input. This insight is crucial for 
incorporating computational services in an extended digital library framework. 
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FEDORA is a digital object and repository architecture designed to 
provide a reliable and secure means to store and access digital content. Digital 
objects in FEDORA are content containers having a structural kernel which 
encapsulates content as opaque byte stream packages and a behavior layer that 
may implement descriptive metadata as well as access functionality for the con- 
tent packages. Through the structural kernel disseminators provide a means to 
discover and invoke content-specific behavior to digital objects. Clients can dis- 
cover, at run-time, the disseminators associated with a digital object as well as 
the methods supported by the disseminator. These methods mediate access to 
information content contained in the digital object. Access control is enforced 
by access managers which are activated when a disseminator is activated. Ser- 
vice requests targeted to a disseminator are intercepted by its associated access 
manager, which implements rights management policies. 



2.2 Metadata 

Metadata, in other words machine-understandable structured information ob- 
jects describing properties of other information objects, are key components of 
network information services, supporting their interaction with software agents 
and other services as well as with human users. Metadata support a range of 
tasks including resource discovery, authentication, rights management, archi- 
ving, and system-level interoperability. In this section, we provide a brief survey 
of metadata development efforts related to digital libraries, with emphasis on 
interoperability (see also Section lT!^ . We also provide a short description of the 
RDF framework and its potential. 

STARTS 23 i® ^ protocol that addresses the main tasks performed by me- 
tasearchers providing unified query interfaces over multiple sources with varying 
search interfaces and models. These tasks include selection of the sources to eva- 
luate a query, evaluation of the query at each of the selected sources, and merging 
the query results. STARTS defines the metadata that each source should export, 
which include a content summary and a description of available query capabili- 
ties. 

The Dublin Core jBJ is a specification of a set of metadata elements de- 
scribing essential properties of networked documents, such as title, author and 
publisher, primarily to support resource discovery. The Warwick framework m 
builds on the Dublin Core to propose a container architecture for aggregating 
multiple packages of metadata, which are separately accessible and maintainable. 
El combines perspectives from multiple rights-holder communities to present 
an integrated model for both descriptive and rights metadata. The provision 
for rights-related metadata provides an important extension to the Dublin Core 
proposal, explicitly addressing the problem of documenting rights ownership ag- 
reements. 

An important development currently under way is the Resource Description 
Framework (RDF) |21, a framework introduced by the W3C Metadata activity, 
based on the XML metalanguage |H|- RDF is a framework for metadata that 
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aims to provide interoperability between applications that exchange machine- 
understandable information on the Web. RDF data consists of nodes and atta- 
ched attribute/value pairs. Nodes represent Web resources, such as servers and 
pages. Attributes are named properties of the nodes, and their values are eit- 
her atomic, such as text strings and numbers, or other resources or metadata 
instances. This mechanism allows the definition of labeled directed graphs that 
model semantic relationships between nodes. RDF in itself does not contain any 
predefined vocabularies for authoring metadata. Such vocabularies are expected 
to emerge as the result of consensus within specific user communities. For exam- 
ple, the Dublin Core vocabulary can be expected to be integrated in the RDF 
framework to support resource discovery in digital libraries. As another example, 
the W3C Digital Signature Working Group (DSig) proposes a standard format 
for making digitally-signed, machine-readable assertions about information re- 
sources. 

2.3 Interoperability 

As a specific example of a full protocol for interoperability among clients and 
providers, we present an overview of the Z39.50 protocol. m provides a concise 
description of the protocol and a discussion of its deployment. Detailed informa- 
tion about the protocol is available through The problem of interoperability 
has been addressed in depth by digital library projects in the context of the NSF 
Digital Libraries Initiative (DLI) [Il4l00j . This section also provides a survey of 
interoperability-related results from this initiative. 

Z39.50 (Information Retrieval (Z39.50); Application Service Definition and 
Protocol Specification, ANSI/NISO Z39. 50-1995) is a protocol which allows a 
client machine to search databases on a server machine and retrieve the records 
that are identified as results of such a search. Each server hosts one or more 
databases containing records. Associated with each database are a set of ac- 
cess points (indices) that can be used for searching. The protocol is stateful and 
connection-oriented. A search produces a set of records, called a ’’result set”, 
that is maintained on the server; the result of a search is a report of the number 
of records comprising the result set. Result sets can be combined or further re- 
stricted by subsequent searches. Records from the result set can be subsequently 
retrieved by the client, by issuing requests that contain options for controlling 
the contents and format of the records to be returned. Z39.50 defines a query lan- 
guage for specifying searches that rely on registered attribute set definitions that 
specify the names of access points, and various record syntaxes for transferring 
records from the server to the client, such as the MARC syntax for bibliographic 
data. 

A comprehensive interoperability solution is provided by the InfoBus test- 
bed in the context of the Stanford Integrated Digital Libraries Project, 
which applies distributed object technology to enable interoperability, by using 
wrapper objects to present a unified interface of digital library services and a 
metadata architecture |til2 1 ] to maintain metadata. The InfoBus set of interope- 
rability protocols have been augmented with support for customized coordina- 
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tion for automating specific types of transactions. An example of such a specific 
solution is given in m which presents event-driven models for different modes 
of consumer-to-merchant interaction, and an API to facilitate commerce tran- 
sactions. Shopping models encapsulate the rules for specific types of commerce 
transactions and instruct the participants what to do next in the way of orde- 
ring, payment and delivery. Another example is given in m which addresses 
the issue of rights management and discusses the automation of certain aspects 
of contract negotiation and enforcement. 

In the University of Michigan Digital Library (UMDL) Project |fiti2j . inter- 
acting software agents cooperate and compete for resources in the context of 
a virtual information economy to provide library services to users. Each agent 
performs a highly specialized library task and has a generic communication inter- 
face, and represents either a resource or a functional unit of the overall system. 
In order to service user requests, functional units and resources need to be dy- 
namically combined, by forming teams of agents under the guidance of query 
planning agents. An essential aspect of this approach is that the capabilities and 
requirements of any functional unit in the digital library system are explicitly 
described using a formal language. Using this facility, it is possible for agents to 
negotiate about provision of complex services as well as resource allocation. A 
distinguishing aspect of the UMDL testbed is that resource allocation is handled 
using a market metaphor m- 

An intriguing aspect of interoperability is that, as pointed out in , compa- 
ring solutions is very difficult since different approaches operate under differing 
assumptions, and their design goals may be conflicting. Currently there are no 
quantitative metrics for evaluating interoperability solutions. However, certain 
criteria provide guidelines for understanding distributed and interoperable digi- 
tal libraries, by allowing evaluators to articulate the design goals and to under- 
stand trade-offs among them The criteria include the degree of component 
autonomy, the cost of the infrastructure, scalability in the number of compo- 
nents, the ease and relative cost of contributing and using components, and the 
breadth of task complexity supported. 

2.4 Intellectual Property Rights Management 

The protection of intellectual property rights is a fundamental issue for digital 
library systems, and is closely related to the more general issue of enforcing 
specific semantics for interactions with information content and services (see 
Sections ft. 21 and . The notion of contract is receiving considerable attention 

in the context of rights management and access control, which are key issues 
as the interest in electronic commerce is growing. Contacts provide a powerful 
framework for expressing and managing complex relationships between transac- 
tion participants. The Digital Property Rights Language (DPRL) jSD] draws its 
basic constructs from contract law to provide a framework for specifying specific 
rights to use and manipulate digital content and actions taken in response to 
usage of these rights. The Stanford Framework for Interoperable Rights Manage- 
ment (FIRM) defines a programmable rights management service layer to 
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support rights/relationship management. The FIRM Common Rights Language 
Object Model is an interface specification that describes how generic concepts 
and principles from contract law are reified digitally. FIRM Object Attribute 
Models are based on a standard format for defining media-specific or domain- 
specific rights vocabularies. The RManage relationship manager application is 
a prototype implementation of the FIRM interface, providing implementations 
for contracts such as group licenses and subscriptions. 

Container technology is the key element in InterTrust Technologies’ Inter- 
Trust Commerce Platform |3|. The DigiBox secure container enables the 
association (via cryptographic means) of rules and controls with information 
content, to specify the types of content usage permitted and the consequences 
of usage (such as payment and report generation). Containers are manipula- 
ted using a trusted rights protection application to make the protected content 
available according to its associated access control rules. Similar functionality is 
provided by IBM’s Cryptolope container m- 

2.5 Extendend Document Models 

A central argument in this paper is that broadening the scope of electronic 
documents is essential for extending the functionality of digital library systems. 
This extension builds upon the capability to encapsulate descriptive metadata 
within digital objects that allow clients to discover at run-time the functionality 
supported, as well as the associated terms, conditions and guarantees. 

Related work is reported in m, which describes the distributed active rela- 
tionships model for representing data and metadata in digital library objects, as 
an extension of the Warwick Container Framework m which provides a unify- 
ing abstraction for handling metadata that follow diverse standards. This model 
explicitly expresses the relationships between networked resources and allows 
these relationships to be dynamically downloadable and executable. Containers 
are used for aggregating data sets into digital objects, with the ability to support 
relationships by referencing executable code that may enforce special semantics, 
as illustrated in a rights management scenario. 

IbtiSTl describe ComMentor, a prototype developed in the context of the St- 
anford Integrated Digital Libraries Project that enables sharing of structured 
in-place annotations attached to documents on the WWW. This is presented 
as a specific instantiation of a general virtual document architecture in which 
viewed documents can incorporate material that is dynamically integrated from 
distributed sources. The mechanism for shared annotation provides the basis for 
adding value-added super-structures to WWW documents and supporting on- 
line communities by allowing multiple individuals to create lightweight “trails” 
through the shared document space. Example applications include shared com- 
ments, collaborative filtering, seals of approval for content rating, and multiple 
guided tours through the same content. 

The multivalent document model presented in M is an attempt to sup- 
port active and networked documents. A multivalent document is decomposed 
into layers of content, and functionality is provided by behaviors, which are 
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dynamically-loaded program objects that manipulate the content. A behavior 
may communicate with specific layers and other behaviors in order to provide 
its functionality. Once built, a multivalent document can be extended by adding 
new layers and behaviors. This model provides a broad framework for documents 
that incorporate active behavior in response to user interactions, and |47| de- 
scribes examples of its application, including an example of interaction with a 
remote service. BHI presents a framework for annotations, which are implemen- 
ted as special behaviors of multivalent documents. 

Developments related to the XML (extensible Markup Language) metalan- 
guage 0 make it possible to extend the functionality of electronic documents. 
XML, which is a subset of SGML m, provides a standardized text format for 
describing structured data for use by WWW applications. XML documents are 
composed of entities, which are storage units containing text and/or binary data. 
Character streams form both a document’s data and markup. Markup, in the 
form of tags, describes a document’s layout and structure. Thus, XML-encoded 
data is self-describing. A document may optionally be associated with a Docu- 
ment Type Definition (DTD) that defines structuring rules, allowing validation of 
the data. A well-formed XML document is unambiguous, allowing standard bro- 
wsing and editing tools to read the tags and construct a parse tree representing 
its hierarchical structure, without requiring the corresponding DTD. Extremely 
diverse structured data can be encoded using standard tag sets (markup), and 
exchanged either between applications and clients that need to display and mani- 
pulate it, or between application servers for the purposes of automated processing 
|3g25|Z|. Examples of data that can be exchanged via XML-encoded documents 
include purchase orders, invoices, product catalogs, sets of records retrieved from 
database systems, results from scientific experiments, bibliographic catalogs, and 
reports with embedded annotations. Since XML is a text-based format, it can be 
delivered via the HTTP protocol. Furthermore, XML encoding, unlike HTML, 
separates presentation/rendering issues from actual data content, allowing for 
example multiple views to be generated from the same data. Finally, the Do- 
cument Object Model m provides a platform- and language-neutral interface 
allowing programs and scripts to dynamically access and update the content, 
structure and style of documents. 

WIDE PI is a metadata syntax, implemented in XML, that defines pro- 
grammatic interfaces to Web content and services, so as to enable automated 
and structured access by client programs. WIDL definitions include the location 
(URL) of each service, input parameters to be submitted (via the GET and 
POST methods of HTTP), and output parameters to be returned by each ser- 
vice (as regions of returned documents). It also possible to specify conditions for 
successful completion of a service and error indications to be returned to clients. 
Gonditions further enable chaining of services, so that a client can issue requests 
that incorporate multiple services. A related reference is cni, which presents an 
approach in which domain-specific markup languages are used to handle interac- 
tions in a peer-to-peer system. These languages are understood both by agents 
and the humans who interact with them. The objective is to dynamically inte- 
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grate tools into distributed collaborative applications. SGML m is used as the 
metagrammar system for specifying markup languages. 



3 Outlook 

This section builds on the brief survey of interoperability-related research pre- 
sented in Section |2| to suggest extensions to support our vision of an integrated 
work support environment. Broadening the scope of digital library systems to en- 
compass support throughout all aspects of work necessitates an extended view 
of electronic documents to allow for combinations of information content to- 
gether with executable components. Section o makes the point that container 
technology can provide the necessary framework for building value-added ser- 
vices for digital information objects. Section [l.tti presents specific examples of 
value-added services for digital information objects that illustrate the issues of 
integrated work support in an open and dynamic environment. Section 1^31 outli- 
nes our view of an integrated work support environment, which is structured as 
an information economy. Finally, Section emphasizes the requirement for mu- 
tual respect of terms and conditions by the trading partners of the information 
economy, and outlines the design of an application model and an infrastructure 
for interactions in this environment. 



3.1 Container Frameworks 

A component is a software module, encapsulating application code, that can be 
combined with other components and a script to produce a a custom applica- 
tion environment m- Components execute within containers, which provide the 
run-time environment for one or more components and a range of standard ma- 
nagement and control services for these components. A component model defines 
the guidelines that developers must adhere to, in the form of standard interfaces 
that enable other active components or applications to invoke its functions and 
access its data. Moreover, a component model defines the customizable properties 
exposed by components, allowing a component to be adapted to specific applica- 
tion requirements. Component-based application development involves selecting 
appropriate components and assembling them into a configuration that supports 
the functions required for an application. 

In the context of digital library systems and electronic documents, it has 
been demonstrated that container technology enables the encapsulation of infor- 
mation content together with rules and controls specifying the types of content 
usage permitted, as well as the consequences of usage (such as payment and 
report generation) . There are proposals for using containers as a mechanism for 
securing intellectual property rights (see Section ByTIl . As discussed in more detail 
in Section other applications of container technology are possible as well, 
including support for compound documents that incorporate active content and 
automation of processes involving multi-party peer-to-peer interactions for the 
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purposes of collaboration and commerce. Such value-added services are of par- 
ticular interest in the context of digital libraries aiming to provide functionality 
extending beyond that of a simple repository of electronic documents. Recent de- 
velopments in software component frameworks contribute toward such extended 
functionality. With the emergence of a new generation of component-based soft- 
ware, such as Java Beans |22| . there are now powerful programming environments 
for building components, basic building-block components for building compo- 
nent ensembles, and component-based applications to address specific business 
requirements. Components can be combined in a variety of ways, resulting in a 
high degree of productivity for developers and users. A particularly important 
aspect is that visual application builders and scripting languages can be used 
for composition of components. In this setting, extensible containers become 
essential for managing and deploying components. 

Containers in the Aurora architecture (see |E1 for details) encapsulate soft- 
ware components together with one or more information content modules, and 
related metadata enabling introspection, in other words dynamic discovery and 
inquiry of the container’s capabilities and information content. Containers pro- 
vide a framework for constructing, managing, and deploying compound docu- 
ments, with the additional provision of support for active behavior. Such active 
compound documents enable several value-added services for digital objects in 
a digital library setting. Moreover, they provide the basis for a work session 
framework, by allowing external entities (such as a session manager) to esta- 
blish networks of related containers in order to enact desired flows of data and 
events. This service flow paradigm is essential in the unified treatment of diverse 
applications. 

3.2 Value-Added Services for Digital Information Objects 

Container technology enables several value-added services for digital information 
objects. The following examples demonstrate that by taking a broader view of 
electronic documents it is possible to extend the functionality and scope of digital 
library systems. This point is elaborated in Section rTTll 

Active Compound Documents. A container object can encapsulate multiple 
modules of information content, together with a software component that media- 
tes access to this content, optionally providing enhanced access services. Thus, 
a container supports a framework of active compound documents. A compound 
document can incorporate, apart from its main information content, backgro- 
und material to provide additional insight to the document’s recipients, as well 
as capabilities for interaction with data and tools related with the document. 
Structured documents provide an powerful interaction metaphor, particularly 
well-suited for collaborative applications. 

Electronic Commerce. Realising commerce transactions among multiple tra- 
ding parties involves a complex sequence of interdependent actions, spanning 
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Fig. 1. Active Compound Documents in a Collaborative Work Session. 



over a long period of time, where each action may involve interactions with in- 
formation systems as well as humans. Current electronic commerce systems do 
not support the notion of a complex product / service package consisting of items 
from multiple providers They are limited to commerce transactions over 
items in a single catalog or in an electronic mall that hosts multiple stores. For 
electronic commerce to reach its full potential, it is necessary to provide an open 
infrastructure that supports combining functional modules developed and admi- 
nistered by autonomous providers. Scripting can be used to express the required 
event-driven flow of requests and data among components that implement the 
required business functions. Support for scripting enables commerce transactions 
involving product /service packages. 

Work Sessions in Scientific Experiments. The management of work sessi- 
ons involving the collection, manipulation, and management of data sets which 
can be generated from a large number and variety of sources, poses major chal- 
lenges for current work support technologies, such as information management, 
computer-supported collaborative work, and workflow automation. The main 
objective is not necessarily to automate all tasks of the process, but rather to 
automate the tracking of states of tasks and to allow specification of precon- 
ditions to decide when tasks are ready to be executed and of information flow 
between tasks PH|. This view, together with the requirement for interoperation, 
motivates the integration of coordination and collaboration technologies with in- 
formation management (see Section l,'S.,'Sll . Such integration can provide a shared 
workspace as the basis for collaboration among participants in a work session, 
allowing participants to invoke services and publish the results they get in res- 
ponse to their requests. Participants can discover at run-time how to invoke a 
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particular service, by looking up the service’s registration entry in the directory 
maintained by the run-time environment. Publish/subscribe communication sup- 
port 1321 allows all “subscribers” to receive the results produced by a invoking a 
service. Moreover, specialized components may act as large-grain caches of data 
sets. In this case, these components become accessible by registering with the 
service directory and advertising their capability to provide access to data sets 
that have been derived by a sequence of manipulations on primitive data sets. 



Terms, Conditions, and Gnarantees. Realising an information economy re- 
quires mechanisms for monitoring and enforcing mutually binding terms and 
conditions between users and providers of information content and services, and 
a shift from asset management to relationship management |S3|. The encap- 
sulation of metadata within electronic documents is a powerful mechanism for 
providing documentation of the terms, conditions and guarantees determined 
by providers of information material and services. This information can be ins- 
pected by users at run-time, prior to actual usage of the encapsulated content 
and services. Moreover, it is possible to design generic enforcement mechanisms. 
As a specific example, we are developing a service-level management infrastruc- 
ture in the context of the Aurora architecture (see Section for details). This 
functionality is based on the ability to intercept all incoming and outgoing mes- 
sages targeted at a component that provides access to information content and 
services. Thus a level of indirection is provided to enable performing authoriza- 
tion checks, logging, and any other pre-processing and post-processing actions 
required for the enforcement of agreed-upon conditions and guarantees. 

3.3 Integrated Work Support 

Sharing the views expressed in I27EHI, we believe that information management, 
as supported by current-generation digital library systems, needs to be augmen- 
ted with support for coordination and collaboration, as currently supported by 
workflow EIT51 and groupware systems iRii4yi . in order to provide an integrated 
work support environment. Such broad scope necessitates an extended view of 
electronic documents, to evolve from pure containers of information in electronic 
form toward digital information objects that can be described as active compo- 
und documents, combining information content together with executable code to 
introduce active behavior. Moreover, as a result of the autonomy of the service 
and information content providers, this environment takes the form of an in- 
formation economy where software objects that encapsulate information content 
and business-oriented services play the role of goods, and objects representing 
clients and content /service providers play the role of trading partners. A cru- 
cial aspect in such an environment, which resembles a marketplace where clients 
and providers interact by exchanging as well as trading services and information 
content na, is the mutual respect of terms and conditions. This requirement, 
which is also fundamental for the preservation of intellectual property rights in 
digital library systems, provides us with motivation to develop an infrastructure 
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supporting a form of electronic contracts that explicitly declare the terms and 
conditions under which certain services and information content are to be used, 
as well as guarantees that providers commit to maintain for their clients. 

Along these lines, we are developing Aurora PS], an infrastructure for large- 
scale distributed applications supporting dynamic composition and configura- 
tion, as well as contract specification, monitoring and enforcement. Aurora sup- 
ports a service flow execution model, where composite services involving multiple 
resources are realized in the context of work sessions as primitive service requests 
among components that have been configured appropriately to enable interope- 
ration. Components are encapsulated in containers, which export uniform inter- 
faces for monitoring and control, as well as management of asynchronous service 
requests. The developer’s task is to identify appropriate components and “plug” 
them together, via a form of scripting as in m- Scripts describe the desired 
configuration of components for realizing a work process; however, this configu- 
ration can be inspected and manipulated at run-time. Dynamic configuration 
m is achieved by invoking state inspection and control operations exported by 
the containers. A federated directory service enables service providers to publish 
their services and clients to search for offers of interest. This is achieved by ha- 
ving each service provider register entries for the components that it is exporting 
to provide services to clients. Each such entry includes a list of attribute- value 
pairs that allow containers, scripts, and development tools to “discover” and 
utilize a component’s capabilities. 



3.4 Service-Level Agreements 

The concept of service-level management m comes from enterprise data pro- 
cessing centers, typically based on mainframe computing systems that provide 
comprehensive monitoring and resource control facilities. The overall goal is to 
define and maintain required levels of service for the user population of an enter- 
prise. Our ongoing work on the Aurora architecture aims to provide this facility 
in dynamic open systems as well. 

A service-level agreement (SLA) documents the expected behavior of service 
providers, for a given client or client class. The information contained in the 
SLA explicitly defines non-functional attributes of the services (such as availa- 
bility, failure handling, performance, and security) in order to provide guidance 
to clients. This information, in the form of specifications of the measurements 
and events that a component exports together with associated control operati- 
ons, and metadata describing the guarantees and execution policies supported, 
complements the specification of service functional interfaces. It can be used for 
selecting among alternative service offers, based on the client’s requirements on 
attributes of the service. Ongoing proof of conformance to a SLA requires the 
ability to produce on-line reports on the delivered service levels, thus achieving 
accountability, which entails that the availability and level of performance of all 
entities involved in workflow processing be tracked and maintained according 
to predetermined levels. This aspect is particularly important for work sessions 
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that span organization boundaries. SLA enforcement requires on-line monito- 
ring of the delivered service levels and the actual resource/service demands, and 
the ability to invoke configuration and control actions to affect the behavior 
of active sessions. As a specific example, m describes the Aurora monitoring 
infrastructure for supporting SLA management. 

By allowing clients to obtain up-to-date information on the supported gua- 
rantees for reliable execution and expected performance, the SLA specification 
contributes to making services in a dynamic open environment more predicta- 
ble, and, in this sense, more manageable. Clients are provided with guidance 
in planning a strategy for obtaining service despite failures and unpredictable 
performance, which are inherent characteristics of dynamic open environments 
such as the Internet. A client can use the information exported by the SLA to 
set timeouts and to schedule retries and compensating actions in case of failure. 
Moreover, a client can abort or cancel its requests when the measured service- 
level parameters (such as transfer rate and response time) become worse than 
prespecified thresholds. Service providers explicitly state, and maintain up-to- 
date, the guarantees that they can for their service offers under specific run-time 
conditions and subject to specific terms and conditions of use. In a sense, a con- 
tract is established between providers and clients, in the context defined by the 
particular request and current run-time conditions. The promises of providers 
and the corresponding rights of clients are documented, and therefore can 
be continuously monitored. 

It is important that the autonomy of service providers is not compromised, 
as a service provider is the only authority responsible for exporting an interface 
for use by clients and for establishing and enforcing service attributes such as 
transactional and performance guarantees. Another important point is that a 
service provider may combine multiple services, made available by other autono- 
mous providers, in order to provide a composite service to a client. A client does 
not need be aware of this complexity. Moreover, other service providers may not 
be aware that their services are being used in the context of a composite service 
request. The SLA exported by a service provider hides such implementation as- 
pects, exporting only aspects related to the service level that a client can expect, 
together with information about available actions to compensate for actions that 
were not completed successfully or that the client wishes to revoke. 

4 Conclusions 

Dynamic composition of services emerges as a key common requirement for se- 
veral network-centric applications in open environments, such as digital libraries, 
electronic commerce, and scientific collaborative work EH- As demonstrated in 
Section iz.hl container technology is expected to play an increasingly important 
role as a building block for sophisticated digital library services. We consider a 
view of digital libraries that goes beyond that of a repository of on-line informa- 
tion, to provide support throughout all phases of work. This vision is incorpora- 
ted in our ongoing research toward an integrated work support environment that 
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is structured as an information economy. A distributed run-time environment 
based on open protocols serves as the basis of an open and dynamic federation of 
resources and services owned and managed by autonomous authorities. Towards 
this end, container technology contributes a framework for encapsulating value- 
added services in complex-structured documents, including support for active 
and dynamic content, as well as support for collaborative work and automated 
interactions with diverse services. 

Developments related to the XML and RDF efforts by the W3C Consor- 
tium are expected to provide the foundation for machine-understandable self- 
descriptive information regarding services, information content and other re- 
sources. This is essential for supporting automated as well as human-driven 
processes in the context of an information economy. With the increasing em- 
phasis on open systems comprising widely distributed autonomous providers of 
services and information content, we consider service level management to be an 
essential requirement, as an integrated approach to handling the issues of terms, 
conditions and guarantees. Such developments will broaden the scope of digi- 
tal library system, and contribute towards establishing a comprehensive work 
support environment, with a profound impact on our lives. 
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Abstract. This paper provides an overview of the state of the art in 
the design of cryptographic algorithms. It reviews the different type of 
algorithms for encryption and authentication and explains the principles 
of stream ciphers, block ciphers, hash functions, public-key encryption 
algorithms, and digital signature schemes. Subsequently the design and 
evaluation procedures for cryptographic algorithms are discussed. 



1 Introduction 

In our society, digital information and the systems and networks carrying this in- 
formation are abused under many forms: financial transactions are eavesdropped 
or modified, sensitive information of individuals and organizations is eavesdrop- 
ped or stolen, electronic services are used without paying for them, and computer 
systems and networks are broken into or brought down. The tools to perform this 
vary from simple bugs, password sniffers, and password crackers, over malicious 
software such as viruses and malicious applets, to complete hacker workbenches. 
Traditionally computer networks existed within one organization, and one tried 
to defend them against an opponent that came from outside the system. Now 
that we move to open and global networks, and that we are entering an era of 
electronic commerce, a more complex threat model arises: we cannot even trust 
the parties we are dealing with, and the system has to be designed to fight fraud 
within the system. For example, in an electronic transaction system, sellers can 
deny having sent an order if it turns out badly, and traders can deny having 
received an order when it turns out profitable (in order to keep the money). 
The risk for misuse has increased considerably, as potential attackers can ope- 
rate from all over the globe. Moreover, if someone gains access to an electronic 
information system, the scale and impact of the abuse can be much larger than 
in a paper-based system. 

These risks create the need for adequate security measures to protect electro- 
nic information systems. It is clear that in an electronic world physical security or 
personnel security by itself cannot be sufficient. An essential component of every 
secure information system is formed by cryptographic techniques. Other impor- 
tant building blocks are secure operating systems and procedural aspects such as 

* F.W.O. postdoctoral researcher, sponsored by the Fund for Scientific Research - 
Flanders (Belginm). 
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audit tools and management guidelines. Building secure computer systems and 
networks requires a conservative approach which is not always compatible with 
the current rapid developments in the industry; moreover, that security has to 
be kept in mind from the first step of the design. 

In this paper we discuss the principles underlying the design of cryptogra- 
phic algorithms; we distinguish between confidentiality protection (the protec- 
tion against passive eavesdroppers) and authentication (the protection against 
active eavesdroppers, who try to modify information). Then we review the diffe- 
rent issues that arise when selecting, designing, and evaluating a cryptographic 
algorithm. Finally we present some concluding remarks. 



2 Encryption for Secrecy Protection 

The use of cryptography for protection the secrecy of information is as old as 
writing itself (for an excellent historical overview, see D. Kahn d)- The basic 
idea is to apply a ‘complicated’ transformation to the information to be pro- 
tected. When the sender (usually called Alice in cryptography) wants to send 
a message to the recipient (Bob), she will apply to the plaintext P the ma- 
thematical transformation E{). This transformation E{) is called the encryption 
algorithm; the result of this transformation is called the ciphertext or C = E{P). 
Bob will decrypt C by applying the inverse transformation D = E~^] this way 
he recovers P or P = D{C). For a secure algorithm E, the ciphertext C does 
not make sense to outsiders: Eve, who is tapping the connection, can obtain C, 
but she cannot obtain (partial information on) the corresponding plaintext P. 

This approach only works when Bob can keep the transformation D secret. 
While this is acceptable for a person-to-person exchange, it is not feasible for 
large scale use. Bob needs a software or hardware implementation of D\ either 
he has to program it himself, or he has to trust someone to write the program 
for him. Moreover, he will need a different transformation (and program) for 
each correspondent, which is not very practical. Bob and Alice always have to 
face the risk that somehow Eve will obtain D (or E), for example by bribing 
the author of the software or their system manager, or by breaking into their 
computer system. 

This problem can be solved by introducing into the encryption algorithm 
E{) a secret parameter, the key K . Typically such a key is a binary string of 
40 to a few thousand bits. A corresponding key K* is used for the decryption 
algorithm D. One has thus C = Ek{P) and P — Dk*{C) (see also Figure [fl 
which assumes that K* = K). The transformation has to depend strongly (and 
in a very complicated way) on the keys: if one uses a wrong key K*' yf K* , one 
does not obtain the plaintext P but a ‘random’ plaintext Ph Now it is possible 
to publish the encryption algorithm E{) and the decryption algorithm D(); the 
security of the system relies only on the secrecy of two short keys. This implies 
that E{) and D{) can be evaluated publicly and distributed on a commercial 
basis. One can think of the analogy with a mechanical lock: everyone knows how 
such a lock works, but in order to open a particular lock, one needs to know the 
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key or the secret combination. The assumption that the algorithm is known to 
the opponent is known in cryptography as “Kerckhoffs’s principle”; Kerckhoffs 
was a 19th century Dutch cryptographer who was the first to formulate this 
approach. 

A simple example of an encryption algorithm is the so-called ‘Caesar cipher,’ 
after the Roman emperor who used it. The plaintext is encrypted letter by letter; 
the ciphertext is obtained by shifting the letters over a fixed number of positions 
in the alphabet. The secret key indicates the number of positions. It is claimed 
that Caesar always used the value of three, such that “AN example” would 
be encrypted to “dq hadpsoh” . Another example is the name of the computer 
“hal” from S. Kubrick’s “A Space Odyssey (2001)”, which was obtained by 
replacing the letters of “ibm” by their predecessor in the alphabet. This corre- 
sponds to a shift over 25 positions. It is clear that such a system is completely 
insecure. 

A problem which has not yet been addressed is how Alice and Bob exchange 
the secret key. The easy answer is that cryptography does not solve this problem; 
cryptography only makes problems easier. In this case the secrecy of a (large) 
plaintext has been reduced to that of two short keys, which can be exchanged 
on beforehand. The problem of exchanging keys is studied in more detail in an 
area of cryptography that is called ‘key management’. We will not discuss it in 
further detail here. 



K 




Fig. 1. Model for conventional or symmetric encryption 

The branch of science which studies the encryption of information is called 
cryptography . A related branch tries to ‘break’ encryption algorithms, by reco- 
vering the plaintext without knowing the key or by deriving the key from the 
ciphertext and parts of the plaintext; it is called cryptanalysis. The term crypto- 
logy covers both aspects. For more extensive introductions to cryptography, the 
reader is referred to |2li5li9i20l25l2d] . 

Thus far we have assumed that the key for decryption K* is equal to the 
encryption key K, or that it is easy to derive K* from K. This type of algo- 
rithms are called conventional or symmetric ciphers. In public-key or asymmetric 
ciphers, K* and K are always different; moreover, it should be difficult to com- 
pute K* from K . This has the advantage that one can make K public, which 
has important implications to the key management problem. The remainder of 
this section discusses conventional algorithms and public-key algorithms. 
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2.1 Conventional Encryption 

This section introduces the two most common conventional encryption algo- 
rithms: additive stream ciphers and block ciphers. 



Additive Stream Ciphers. Additive stream ciphers are ciphers for which the 
encryption consists of a modulo 2 addition (exclusive or, exor) of a key stream to 
the plaintext (see Figure|2I). The plaintext and ciphertext are divided into words 
of m bits (to is typically 1, 8, or a multiple of 8), and the ith word of the plaintext, 
ciphertext, and key stream is denoted with pi, Ci, and ki, respectively. The 
encryption operation can then be written as Ci = Here © denotes addition 

modulo 2. The decryption operation is identical to the encryption (the cipher is 
an involution): indeed, Pi = Ci(Bki = {pi © ki) (Bh = Pi® {h © fci) = Pi®0 = Pi- 
It is clear that the TO-bit key stream word ki cannot be a constant (in that case 
a cryptanalyst can compute the key stream word from a single ciphertext word 
and the corresponding plaintext word; also repetitions in the plaintext would be 
visible in the ciphertext). One can show that for a strong cipher the sequence of 
ki has to consist of randomly looking strings (see also Sect. 12.211 . 

In practice one computes the words ki with a finite state machine. Such a 
machine stretches a short secret key K into a much longer key stream sequence 
ki] this is called a pseudo-random string generator. The sequence ki is eventually 
periodic. One important (but not sufficient) design criterion for the finite state 
machine is that the period has to be sufficient long (2^^^ is a typical lower bound). 
The values ki should also be uniformly distributed; another condition is that 
there should be no correlations between (part of) successive words (note that 
cryptanalytic attacks exist which exploit correlations of less than 1 in 1 million) . 
Formally, the sequence ki can be parameterized with a security parameter; then 
on requires that the sequence satisfies every polynomial time statistical test 
for randomness (here polynomial means polynomial in the security parameter). 
Another desirable property is that no polynomial time machine can predict the 
next bit of the sequence (based on the previous outputs) with a probability that 
is significantly better than 1/2. An important (and perhaps surprising) result 
in theoretical cryptology by A. Yao shows that these two conditions are in fact 
equivalent m- 
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Fig. 2. An additive stream cipher 
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Block Ciphers. Block ciphers take a different approach to encryption: the 
plaintext is divided into larger words of n bits, called blocks. Every block is enci- 
phered in the same way, using a keyed one-way permutation, i.e., a permutation 
on the set of n-bit strings that is controlled by a secret key. The simplest way to 
encrypt a plaintext using a block cipher is as follows: divide the plaintext into 
n-bit blocks, and encrypt these block by block. The decryption also operates on 
individual blocks: 



Ci = EK{pi) and pi = DK{ci). 

This way of using a block cipher is called the ECB (Electronic CodeBook) mode. 
Note that the encryption operation does not depend on the location in the 
ciphertext as is the case for additive stream ciphers. 

Consider the following attack on a block cipher (the so-called tabulation 
attack): the cryptanalyst collects ciphertext blocks and their corresponding 
plaintext blocks (this is possible as part of the plaintext is often predictable); 
this is used to build a large table. With such a table, one can deduce informa- 
tion on other plaintexts encrypted under the same key. In order to preclude this 
attack, the value of n has to be quite large (e.g., 64 or 128) and the plaintext 
should not contain any repetitions (or other patterns), as these will be leaked to 
the ciphertext. 

This shows that even if n is large, the ECB mode is not suited to encrypt 
plaintexts that are not random (such as text, images, etc.). This mode should 
only be used in very special cases, where the plaintext is already random, such as 
the encryption of cryptographic keys. There is however an easy way to randomize 
the plaintext, by using the block cipher in a different mode of operation. 

The default mode of operation for a block cipher is the CBC (Cipher Block 
Chaining) mode. In this mode the different blocks are coupled by adding modulo 
2 to a plaintext block the previous ciphertext block: 

Ci = ExiPi® c^-i) and pi = Dxici) ® Ci-i . 

Note that this ‘randomizes’ the plaintext, and hides patterns. To enable the 
encryption of the first plaintext block (i = 1), one defines cq as the initial value 
IV. By varying this value, one can ensure that the same plaintext is encrypted 
into a different ciphertext under the same key. The CBC mode allows for random 
access on decryption: if necessary, one can decrypt only a small part of the 
ciphertext. 

It is also possible to use a block cipher as an additive stream cipher by feeding 
the output back to the input; this mode is known as the OFB (Output FeedBack) 
mode. A second stream mode is the CEB (Cipher FeedBack) mode; it has better 
synchronization properties. The modes of operation have been standardized in 

Eini- 

This section has illustrated that a block cipher forms a very flexible building 
block. The most famous block cipher is the Data Encryption Standard (or DES) 
0, which is widely used since 1977. The DES has a block size of 64 bits and a 
key length of 56 bits; it will be shown in Sect. l4.;il that this is no longer sufficient. 
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Therefore the US government is planning to replace it by a new block cipher, 
the AES (Advanced Encryption Standard). Hereto an open call for algorithms 
has been launched in September ’97; 15 candidates have been submitted by the 
deadline of June ’98. Currently the evaluation procedure is under way. The AES 
will have a block length of 128 bits and a key length between 128 and 256 bits. 

2.2 Security of Conventional Algorithms 

An essential aspect in the choice of an encryption algorithm is the security level. 
In 1926 G.S. Vernam has published a simple encryption algorithm for telegraphic 
messages EZ). The cipher is an additive stream cipher, where the key stream 
consists of a completely random sequence, generated by a binary symmetric 
source (all bits are uniformly and identically distributed). In 1949 C. Shannon, 
the father of information theory, was able to prove mathematically that this 
scheme offers perfect security, i.e., from observing the ciphertext, the opponent 
cannot obtain any information on the plaintext, no matter how much computing 
power he has m The main disadvantage of this scheme is that the secret key is 
exactly as long as the message (one should never reuse a key stream); C. Shannon 
also showed that this the best one can do if one wants perfect security. In spite 
of the long key, the Vernam algorithm is still used by diplomats and spies; it has 
been used for the ‘red telephone’ between Washington and Moscow. Spies used 
to carry key pads with random characters (it is easy to generalize the scheme 
to arbitrary alphabets) . The security of the scheme relies on the fact that every 
page of the pad is used only once, which explains the name “one-time pad” . 

In most commercial applications one cannot afford to distribute keys which 
are as long as the plaintext. Therefore one uses encryption algorithms which do 
not offer perfect security; this implies that it is in principle possible to recover 
the plaintext and/or the secret key from the ciphertext, in the sense that one 
has sufficient information to do this. This does not mean that it is also possible 
in practice. For example, additive stream ciphers try to mimic the approach of 
the Vernam scheme by replacing the random key stream sequence by a pseudo- 
random sequence generated from a short key. 

2.3 Public-Key Encryption 

The main problem that is left unsolved by conventional cryptography is the key 
distribution problem. Especially in a large network it is not feasible to distribute 
keys between all user pairs (in a network with t users there are t{t — l)/2 such 
pairs). An alternative is to manage all keys in a central location, but this may 
then become a single point of failure. Public-key cryptography offers a much 
more elegant solution to this problem. 

The concept of public-key cryptography has been invented by in 1976, inde- 
pendently by W. Difhe and M. Heilman P| and by R. Merkle EH|- The key idea 
behind public-key cryptography is the concept of trapdoor one-way functions. 
A one-way function is a function that is easy to compute, but hard to invert. 
For example, in a conventional block cipher, the ciphertext has to be a one-way 
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function of the plaintext and the key: it is easy to compute the ciphertext from 
the plaintext and the key, but given the plaintext and the ciphertext it should 
be hard to recover the key (otherwise the block cipher would not be secure). 
Similarly one can show that the existence of additive stream ciphers (pseudo- 
random string generators) implies the existence of one-way functions. Trapdoor 
one-way functions are one-way function with an additional property: given some 
additional information (the trapdoor), it becomes possible to invert the one-way 
function. 

With such functions Bob can send a secret message to Alice without the need 
for prior arrangement of a secret key. Alice chooses a trapdoor one-way function 
with public parameter Pa (Alice’s public key) and with secret parameter Sa 
(Alice’s secret key). Alice makes her public key widely available (she can put it 
on her home page, but it can also be included in special directories) . Anyone who 
wants to send some confidential information to Alice, computes the ciphertext 
as the image of the plaintext under the trapdoor one-way function using the 
parameter Pa- Upon receipt of this ciphertext, Alice recovers the plaintext by 
using her trapdoor information Sa (see Figure ED. An attacker, who does not 
know Sa, sees only the image of the plaintext under a one-way function, and 
will not be able to recover the plaintext. This assumes that it is infeasible to 
compute Sa from Pa- Note that if one wants to send a message to Alice, one 
has to know Alice’s public key Pa, and one has to be sure that this key really 
belongs to Alice (and not to Eve), since it is only the owner of the corresponding 
secret key who will be able to decrypt the ciphertext. Public keys do not need 
a secure channel for their distribution, but they do need an authentic channel. 
As the keys for encryption and decryption are different, and Alice and Bob 
have different information, public-key algorithms are also known as asymmetric 
algorithms. 




Fig. 3. Model for public-key or asymmetric encryption 

The conditions which a public-key encryption algorithm has to satisfy are: 

- the generation of a key pair {Pa, Sa) has to be easy; 

- encryption and decryption have to be easy operations; 

- it should be hard to compute the public key Pa from the corresponding 
secret key Sa', 

- Sa{Pa{P)) = P- 

Designing a secure public-key encryption algorithm is apparently a very dif- 
ficult problem. From the large number of proposals, only a few have survived 
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(for example, almost all knapsack-based systems have been broken). The most 
popular algorithm is the RSA algorithm |22|. which was named after its inven- 
tors (R.L. Rivest, A. Shamir, and L. Adleman). The security of RSA is based 
on the fact that it is relatively simple to find two large prime numbers (in 1998 
large means 115 decimal digits or more) and to multiply these, while factoring 
their product (of 230 decimal digits) is not feasible with the current algorithms 
and computers. 

key generation: Find 2 prime numbers p and q with at least 115 digits and 
compute their product, the modulus n = p ■ q. Compute the Carmichael 
function A(n), the least common multiple of p — 1 and q — 1. Choose an 
encryption exponent e (at least 40 to 64 bits long), which is relatively prime 
to A(n) and compute the decryption exponent as d = e~^ mod A(n) (with 
Euclid’s algorithm). The public key consists of the pair (e, n), and the secret 
key consists of the decryption exponent d or the pair (p, g); 
encryption: represent the plaintext as an integer in the interval [0, n — 1] and 
compute the ciphertext as C = P® mod n; 
decryption: P = C^ mod n. 

Without explaining the mathematical background of the algorithm, one can ob- 
serve that decryption requires the extraction of modular eth roots; no algorithm 
is known for this problem which does not use the prime factors of n; finding the 
decryption exponent requires knowledge of A(n) and hence of the factors of n. 
On the other hand, this knowledge is not required for the encryption operation. 
For the practical use of RSA, one has to take into account many technical de- 
tails: for example, the plaintext P has to be mapped (with a function that is 
easy to invert) to a random integer G [0,n — 1] in order to avoid trivial attacks 
(e.g., the extraction of natural eth roots when P® < n). 

The more complex properties of public-key cryptography seem to require 
some ‘high level’ mathematical structure; most public- key algorithms are based 
on number theoretic problems (such as factoring and discrete logarithm in cer- 
tain groups). While these number theoretic problems are believed to be difficult, 
it should be noted that since the invention of public-key cryptography signifi- 
cant progress has been made in factoring: the factorization record in 1975 was 
39 decimal digits; in 1985 this was increased to 65 digits, and in 1994 a 130-digit 
modulus was factored. This evolution is due to a combination of more sophi- 
sticated factoring algorithms with progress in hardware and parallel processing. 
The cryptographer should take this into account by selecting sufficiently large 
keys for public-key algorithms. 

The main advantage of public-key algorithms is the simplified key manage- 
ment. The main disadvantages are the larger keys (typically 64 to 256 bytes) 
and the slow performance: both in software and hardware public-key encryp- 
tion algorithms are two to three orders of magnitude slower than conventional 
algorithms. For example, a 1024-bit exponentiation requires about 0.3 seconds 
on a 90 MHz Pentium, which corresponds to 3.4 kbit/s. On the same machine, 
DES runs at 16.9 Mbit/s. Because of the large difference in performance and the 
larger block length (which influences error propagation), one always employs 
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hybrid systems: the public-key encryption scheme is used to distribute a secret 
key, which is then used in a fast conventional algorithm. 

3 Hashing and Signatnres for Authentication 

Information authentication includes two main aspects: 

- data origin authentication, or who has originated the information; 

- data integrity, or has the information been modified. 

Other aspects which can be important are the timeliness of the information, the 
sequence of messages, and the destination of information. These aspects can be 
accounted for by using sequence numbers and time stamps in the messages and 
by including addressing information in the data. In data communications, the 
implicit authentication created by recognition of the handwriting, signature, or 
voice disappears. The reason is that information becomes much more vulnerable 
to falsification as the physical coupling between information and its bearer is 
lost. 

Until recently it was widely believed that encryption of information (with a 
conventional algorithm) was sufficient for protecting its authenticity. The reaso- 
ning was that if a certain ciphertext resulted after decryption in a meaningful 
plaintext, it had to be created by someone who knew the key, and therefore it 
must be authentic. A few counterexamples are sufficient to refute this claim: if 
a block cipher is used in ECB mode, an attacker can always reorder the blocks. 
For any additive stream cipher (including the Vernam scheme), an opponent can 
always modify any plaintext bit (without knowing whether a 0 has been changed 
to a 1 or vice versa). The concept ‘meaningful’ information implicitly assumes 
that the information contains redundancy, which allows to distinguish genuine 
information from an arbitrary plaintext. However, one can envisage applicati- 
ons where the plaintext contains very little or no redundancy. The separation 
between secrecy and authentication has also been clarified by public-key crypto- 
graphy: anyone who knows Alice’s public key can send her a confidential message, 
and therefore Alice has no idea who has actually sent this message. 

Two different levels of information authentication can be distinguished. If 
two parties trust each other and want to protect themselves against malicious 
outsiders, the term ‘conventional message authentication’ is used. In this setting, 
both parties are at equal footing (for example, they share the same secret key). 
If however a dispute arises between them, a third party (such as a judge) will 
not be able to resolve it (for example a judge cannot tell whether a message has 
been created by Alice or by Bob) . If protection between two mutually distrustful 
parties is required (which is often the case in commercial relationships), an el- 
ectronic equivalent of a manual signature is needed. In cryptographic terms this 
is called a digital signature. 

3.1 Symmetric Authentication 

The underlying idea is similar to that for encryption, where the secrecy of a 
large amount of information is replaced by the secrecy of a short key. In the 
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case of authentication, one replaces the authenticity of the information by the 
protection of a short string, which is a unique ‘fingerprint’ of the information. 
Such a ‘fingerprint’ is computed as a hash result. This can also be interpreted as 
adding a special form of redundancy to the information. This process consists of 
two components. First one compresses the information to a string of fixed length, 
with a (cryptographic) hash function. Then the resulting string (the hash result) 
is protected as follows: 

- either the hash result is communicated over an authentic channel (e.g., it can 
be read over the phone) . It is then sufficient to use a hash function without 
a secret parameter, which is called a Manipulation Detection Code or MDC; 

- or the hash function uses a secret parameter (the key); it is then called a 
Message Authentication Code or MAC. 



MDCs. If an additional (authentic) channel is available, MDCs can provide 
authenticity without requiring secret keys. Moreover an MDC is a flexible pri- 
mitive, which can be used for a variety of other cryptographic applications. An 
MDC has to satisfy the following conditions: 

- it should be hard to find an input with a given hash result (preimage resi- 
stance); 

- it should be hard to find a second input with the same hash result as a given 
input (2nd preimage resistance); 

- it should be hard to find two different inputs with the same hash result 
(collision resistance). 

An MDC satisfying these three conditions is called a collision resistant hash 
function. For a strong hash function with an n-bit result, solving one of the first 
two problems requires about 2” evaluations of the hash function. This implies 
that n = 64 ... 80 is sufficient. However, finding collisions is much easier: one 
will find with high probability a collision in a set of hash results corresponding 
to 2"/^ inputs. This implies that collision resistant hash functions need a hash 
result of 128 to 160 bits. This last property is also known as the birthday paradox 
based on the following observation: within a group of 24 persons the probability 
that there are two persons with the same birthday is about 50%. The reason 
is that a group of this size contains 276 different pairs of persons, which is a 
large fraction of the 365 days in a year. Note that the birthday paradox plays 
an essential role in the security of many cryptographic primitives (cf. Sect.^SJ- 
Examples of MDCs in use today are RIPEMD-160 and SHA-1; both have been 
standardized in H2|. Not all applications need collision resistant hash functions; 
sometimes (2nd) preimage resistance is sufficient. 



MACs. MACs have been used for more than twenty years in electronic tran- 
sactions in the banking environment. They require the exchange of a secret key 
between the communicating parties. The MAC corresponding to a message is a 
complex function of every bit of the message and every bit of the key; it should 
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be infeasible to derive the key from observing a number of text/MAC pairs, or 
to compute or predict a MAC without knowing the secret key. 

A MAC is used as follows: Alice computes for her message P the value 
MAC/<-(P) and appends this MAC to the message (here MAC denotes both 
the function and its result). Bob recomputes the value of MAC*:(P) based on 
the received message P, and verifies whether it matches the received MAC. If 
the answer is positive, he accepts the message as authentic, i.e., as a genuine 
message from Alice. Eve, the active eavesdropper, can modify the message P to 
P', but she is not able to compute the corresponding MAC value MAC(P'), as 
she is not privy to the secret key K. For a secure MAC, the best Eve can do is 
guessing the MAC. In that case. Bob can detect the modification with high pro- 
bability: for an n-bit MAC Eve’s probability of success is only 1/2”. The value 
of n lies typically between 32 and 64. Note that if encryption and authentication 
are combined, the key for encryption and authentication need to be different. 
Moreover, the preferred option is to compute the MAC on the plaintext. 

A popular way to compute a MAC is to encrypt the message with a block 
cipher using the CBC mode (yet another use of a block cipher), and to keep 
only part of the bits of the last block as the MAC. However, recent research has 
indicated that this approach is less secure than previously believed EH; again, 
the birthday paradox plays a role in this work. 

For a MAC, the equivalent of the Vernam scheme exists. This implies that 
one can design a MAC algorithm which is unconditionally secure, in the sense 
that the security of the MAC is independent of the computing power of the op- 
ponent. The requirement is again that the secret key is used only once. The basic 
idea of this approach is due to G.J. Simmons and dates back to the seventies 
(see for example EHl). It turns out that these algorithms can be computationally 
very efficient, since the properties required from this primitive are combinato- 
rial rather than cryptographic. Recent constructions are therefore one order of 
magnitude faster than other cryptographic primitives (encryption algorithms, 
hash functions), and achieve speeds up to 1 Gbit/s on fast processors |3j. A 
simple example is described here, which is derived from Reed-Solomon codes for 
error-correction US]. The key consists of two n-bit words denoted with Ki and 
K 2 - The plaintext is divided into t n-bit words, denoted with pi through pt- 
The MAC, which consists of a single n-bit word, is computed based on a simple 
polynomial evaluation: 



t 

MACk,,kM =Kl+Y,P^■ {K2Y , 

i=l 

where addition and multiplication are to be computed in the finite field with 
2” elements. It can be proved that the probability of creating another valid 
message/MAC pair is upper bounded by t/2". A practical choice is n = 64, which 
results in a 128-bit key. For messages up to 1 Mbyte, the success probability of 
a forgery is then less than 1/2"*^. Note that it turns out to be possible to reuse 
K 2 ', however, for every message a new key Ki is required. This key could be 
generated from a short initial key using an additive stream cipher, but then 
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the unconditional security is lost. However, one can argue that it is easier to 
understand the security of this scheme than that of a computationally secure 
MAC. 

3.2 Digital Signatures 

A digital signature is the electronic equivalent of a manual signature on a do- 
cument. It provides a strong binding between the document and a person, and 
in case of a dispute, a third party can decide whether or not the signature is 
valid. Of course a digital signature will not bind a person and a document, but 
will bind a key and a document. Additional measures are then required to bind 
the person to his or her key. Note that for a MAC, both Alice and Bob can 
compute the MAC, hence a third party cannot distinguish between them. While 
block ciphers (and even one-way functions) can be used to construct digital sig- 
natures, the most elegant and efficient constructions for digital signature rely on 
public-key cryptography. 

If Alice wants to sign some information P intended for Bob, she adds some 
redundancy to the information, resulting in P, and decrypts the resulting text 
with her secret key. This operation can only be carried out by Alice. Upon 
receipt of the signature. Bob encrypts it using Alice’s public key, and verifies 
that the information P has the prescribed redundancy. If so, he accepts the 
signature on P as valid. Such a digital signature (which is a signature with 
‘message recovery’) imposes an additional condition on the public-key system: 
Pa{Sa{P)) = P- Note that anyone who knows Alice’s public key can verify the 
signature. The RSA public-key encryption scheme is a bijection (a trapdoor one- 
way permutation), and thus it allows for the construction of digital signatures 
with message recovery. We leave it as an exercise to the reader to show why the 
redundancy is essential in this approach. 

If Alice wants to sign very long messages (without encrypting them), this 
approach results in signatures that are as long as the message. Moreover, signing 
with a public-key system is a relatively slow operation. In order to solve these 
problems, Alice does not sign the information itself, but the hash result of the 
information computed with an MDC. The signature now consists of a single 
block, which is appended to the information (this is called a digital signature 
‘with appendix’). In order to verify such a signature. Bob recomputes the MDC of 
the message and encrypts the signature with Alice’s public key. If both operations 
give the same result. Bob accepts the signature as valid. MDCs used in this way 
need to be collision resistant: otherwise Alice can sign a message P, and later 
be held accountable for a fraudulent message P' with the same MDC (and thus 
with the same signature). 

Note that there exist other signature schemes with appendix (such as the 
DSA 13), which are not derived immediately from a public-key encryption 
scheme. For these schemes one can define a ‘signing operation’ (using the secret 
key) and a ‘verification operation’ (using the public key), without referring to 
‘decryption’ and ‘encryption’ operations. 
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4 Analysis and Design of Conventional Cryptographic 
Algorithms 

In this section we compare three approaches to the design of cryptographic 
algorithms. Next we describe the typical phases in the life of an algorithm. Then 
we contrast brute force and shortcut attacks, public and secret algorithms, and 
weak and strong algorithms. 



4.1 Three Approaches in Cryptography 

Present day cryptology tries to develop provably secure and efficient cryptogra- 
phic algorithms. Often such algorithms are not available; therefore cryptogra- 
phic algorithms are studied following three approaches: the information theoretic 
approach, the complexity theoretic approach, and the system based approach. 
These approaches differ in the assumptions about the capabilities of an oppo- 
nent, in the definition of a cryptanalytic success, and in the notion of security. 

The most desirable from the viewpoint of the cryptographer are unconditio- 
nally secure algorithms; this design approach is also known as the information 
theoretic approach. However, few such schemes exist: examples are the Vernam 
scheme ISect. E^ . and the MAC based on Reed-Solomon codes ISect. mTl . While 
they are computationally very efficient, the cost in terms of key material may 
be prohibitively large (certainly for the Vernam scheme). For most applications 
one has to live with schemes which offer only conditional security. 

A second approach is to reduce the security of his scheme to that of other 
well known difficult problems, or to that of other cryptographic primitives. The 
complexity theoretic approach starts from an abstract model for computation, 
and assumes that the opponent has limited computing power within this model 
0. This approach has many positive sides: 

— It forces the formulation of exact definitions, and to state clearly the security 
properties and assumptions. 

— Once the proofs are written down, anyone can verify them and decide 
whether or not they are correct. 

However, this approach also has some limitations: 

— Many cryptographic applications need building blocks, such are one-way fun- 
ctions, one-way permutations, and pseudo-random functions, which cannot 
be reduced to other primitives. In terms of the existence of such primitives, 
complexity theory has only very weak results: in non-uniform complexity 
(Boolean circuits) the best proved thus far is that there exist functions which 
are twice as hard to invert as to compute, which is far too weak to be of any 
use in cryptography 101 . 

— Sometimes the resulting scheme is not very efficient, or the security reduction 
is quite loose: for example, the correct properties are proved, but the proof 
is only asymptotic and gives no indication of the exact security level for a 
concrete instance. 
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This implies that for many instances, the cryptographer has to rely on the 
system-based or practical approach. This approach tries to produce practical so- 
lutions; the security estimates are based on the best algorithm known to break 
the system and on realistic estimates of the necessary computing power or dedi- 
cated hardware to carry out the algorithm. By trial and error procedures, several 
cryptanalytic principles have emerged, and it is the goal of the designer to avoid 
attacks based on these principles. The second aspect is to design building blocks 
with provable properties, and to assemble such basic building blocks to design 
cryptographic primitives. 



4.2 Life Cycle of a Cryptographic Algorithm 

A cryptographic algorithm usually starts with a new idea of a cryptographer. 
A first step should always consist of an evaluation of the resulting algorithm, in 
which the cryptographer tries to determine whether or not the scheme is secure. If 
the scheme is unconditionally secure, he has to write the proofs, and to convince 
himself that the model is correct and matches the application. For computational 
security, it is again very important to write down security proofs, and to check 
these for subtle flaws. Moreover, one has to assess whether the assumptions 
behind the proofs are realistic. For the system-based approach, it is important 
to prove partial results, and to write down arguments which should convince 
others of the security of the algorithm. Often such cryptographic algorithms 
have security parameters (the number of steps, the size of the key, . ..); it is 
then very important to give lower bounds for these parameters, and to indicated 
the value of the parameters which corresponds to a certain security level. 

The next step is the publication of the algorithm at a conference, in a journal, 
or in an Internet Request for Comment (RFC). This (hopefully) results in an 
independent evaluation of the algorithm. Often more or less subtle flaws are then 
discovered by other researchers. This can vary from small errors in proofs, to 
complete security breaks. Depending on the outcome, this can lead to a small fix 
of the scheme or to abandoning the idea altogether. Sometimes such weaknesses 
can be found ‘in real-time’ when the author is presenting his ideas at a conference, 
but often evaluating a cryptographic algorithm is a very time consuming task; for 
example, the design effort of the Data Encryption Standard (DES) has been more 
than 17 man-years, and the open academic evaluation since has taken a multiple 
of this effort. Cryptanalysis is quite destructive; in this respect it differs from 
usual scientific activities, even when proponents of competing theories criticize 
each other. 

Few algorithms survive the evaluation stage; ideally, this stage should last for 
several years. The survivors can be integrated into products and find their way 
to the market. Sometimes they are standardized by organizations such as NIST 
(National Institute of Standards and Technology, US), IEEE, IETF, or ISO. 

As will be explained below, even if no new security weaknesses are found, 
the security of a cryptographic algorithm degrades over time; if the algorithm is 
not modular, the moment will come when it has to be taken out of service. 
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4.3 Brute Force Attacks Versus Shortcut Attacks 

A detailed description of the evaluation procedures for cryptographic algorithms 
is beyond the scope of this paper. We restrict ourselves to explaining the diffe- 
rence between brute force attacks and shortcut attacks. 



Brute Force Attacks. Brute force attacks are attacks which exist against any 
cryptographic algorithm that is conditionally secure, no matter how it works 
internally. These attacks only depend on the size of the external parameters of 
the algorithm, such as the block length of a block cipher, or the key length of 
any encryption algorithm or MAC. It is the task of the designer to choose the 
external parameters in such a way that brute force attack are infeasible. 

A typical brute force attack against an encryption algorithm or a MAC is 
an exhaustive key search; it is equivalent to breaking into a safe by trying all 
the combinations of the lock. The lock should be designed such that this is not 
feasible in a reasonable amount of time. This attack requires only a few known 
plaintext/ciphertext (or plaintext/MAC) pairs, which one can always obtain in 
practice. It can be precluded by increasing the key length: adding one bit to the 
key doubles the time for exhaustive key search. One should also ensure that the 
key is selected uniformly at random in the key space. 

On a standard PC, trying a single key for a typical algorithm requires a 
few microseconds. For example, a 40-bit key (which is at present the maximum 
value allowed by the US government for general purpose export) will be recovered 
after a few hundred hours. If a LAN with 100 machines can be used, one can 
find the key in a few hours. For a 56-bit key such as DES (which can be exported 
from the US under restrictive conditions), a key search requires a few months if 
several thousand machines are available (as has been demonstrated in the first 
half of 1997). However, if dedicated hardware is used, a different picture emerges. 
Recently a 250 000 US$ machine has been built that finds a 56-bit DES key in 
about 50 hours 0; the design (that required 50% of the cost) has been made 
available for free. 

One should also take into account “Moore’s law” |^, which states that 
computers double their speed every 18 months (for the same cost). This implies 
that a 64-bit key, which offers a reasonable security level for the time being, is 
probably not sufficient for data which needs to be protected for 10 years. Such 
applications will need keys of at least 80 bits. As the cost of increasing the key 
size is quite low, it is advisable to design new algorithms with variable key size 
up to 128. . . 256 bits. 

There exist many other brute force attacks. For example, it turns out the 
security of a block cipher in the CBC mode is decreased by what is called the 
‘matching ciphertext’ attack. As a consequence of the birthday paradox, after 
2 h /2 encryptions with a single key, information on the plaintext starts to leak 
(due to matches in the internal memory, which correspond to matching cipher- 
texts) . This attack can be a problem for present day block ciphers with a 64-bit 
block length. It can only be prevented by designing new block ciphers with larger 
block lengths (128 or more), or by changing the key frequently. 
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Shortcut Attacks. Many algorithms are less secure than suggested by the size 
of their external parameters. It is often possible to find more effective attacks 
than trying all keys. Assessing the strength of an algorithm requires cryptanalytic 
skills and experience, and often hard work. During the last 10 years powerful 
new tools have been developed: this includes differential cryptanalysis P, which 
analyzes the propagation of differences through cryptographic algorithms, linear 
cryptanalysis m, which is based on the propagation of bit correlations, and fast 
correlation attacks on stream ciphers im. 

The design of new algorithms according to the system-based approach is not 
a memoryless process: when new cryptanalytic techniques are developed, the 
cryptographers invent new designs which provide complete (or at least improved) 
resistance against these new attacks. In this way cryptology develops by trial and 
error procedures. 

4.4 Public Versus Secret Algorithms 

The open and independent evaluation process described in Sect. lO offers a 
strong argument for publishing all details of a cryptographic algorithm. Pu- 
blishing the algorithm opens it up for public scrutiny, and is the best way to 
guarantee that it is as strong as claimed. (Note that a public algorithm should 
not be confused with a public-key algorithm.) Published algorithms can be stan- 
dardized, and will be available from more than one source. 

Nevertheless, certain governments and organizations prefer to keep their al- 
gorithms secret. They argue (correctly) that obtaining the algorithm raises an 
additional barrier for the attacker. Moreover, governments want to protect their 
know-how on the design of cryptographic algorithms. (However, obtaining a de- 
scription of the algorithm is often not harder than just bribing one person.) 
This approach is acceptable, provided that sufficient experience and resources 
are available for independent evaluation and re-evaluation of the algorithm. 

4.5 Insecure Versus Secure Algorithms 

In spite of the fact that secure cryptographic algorithms are available, which 
offer good performance, in many applications one encounters very insecure cryp- 
tographic algorithms. For example, popular software sometimes ‘encrypts’ data 
by adding a constant key word to all data words. Several reasons can be indicated 
for this: 

— one excuse is performance: while it is true that adding a constant will always 
be faster than strong encryption, it should be noted that in software, current 
encryption algorithms achieve between 20 and 400 Mbit/s; this is sufficient 
for many applications; 

— legal and/or export restrictions: for national security reasons, certain coun- 
tries (such as the USA) do not allow the export of strong encryption algo- 
rithms; some countries (such as France) do not allow for strong encryption 
within their territory (unless the keys are handed over to the government); 
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— commercial pressure: companies often rush their security solutions to market, 
without allowing for sufficient time for the slow evaluation process; 

— evolution of computing power: the strength of a cryptographic algorithm 
erodes over time because of Moore’s law; often there exists a large inertia to 
replace or upgrade an algorithm. A typical example is the DES, that is still 
widely used in spite of a 56-bit key. 

— evolution of cryptanalysis: if designers are not aware of the latest develop- 
ments in cryptanalysis, it is quite likely that their algorithms will not resist 
these attacks. For example, the FEAL block cipher with 8 rounds, which was 
published in 1987, can now be broken with only 10 chosen plaintexts. 

5 Concluding Remarks 

Securing an application should be based on a careful analysis of the risks and 
vulnerabilities; this should lead to understanding the security requirements for 
the data and the communication channels. The next step consists of selecting 
the right mix of cryptographic algorithms to satisfy these requirements. A very 
important aspect is the underlying key management infrastructure, which ensu- 
res that private and public keys can be established and maintained throughout 
the system in a secure way. This is where cryptography meets the constraints of 
the real world. 

This paper only scratches the surface of modern cryptology, as the discussion 
is restricted to a few basic techniques. Other problems solved in cryptography 
include secure identification, secure sharing of secrets, electronic cash, and co- 
pyright protection. Many interesting problems are studied under the umbrella 
of secure multi-party computation; examples are electronic elections, and the 
generation and verification of digital signatures in a distributed way. 
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Abstract. This document aims at describing main issues in the area of 
structured multimedia documents. Documents can be modelled through 
four main dimensions (logical, hypermedia, spatial and temporal) and 
will be illustrated by the main corresponding standards (SGML/XML, 
HTML, CSS, DSSSL/XSL, SMIL). Building authoring tools that are 
capable to deal with these dimensions (and specially the temporal one) 
is still a great challenge. We describe some authoring applications and 
develop temporal aspects of documents through the analysis of new spe- 
cification and authoring needs required for handling multimedia docu- 
ments. 
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1 Introduction 

Electronic documents have been the scope of numerous research activities for 
years. These works have lead to the identification of the main characteristics 
attached to documents and to their modeling through several dimensions such 
as the logical, physical, navigational and temporal ones 0. One of the major 
results of that is the emergence of standards such as XML |2S| (extended Markup 
Language), HTML PS], HyTime [E], DSSSL 0, SMIL j2D], etc. These standards 
aim at making easier the processing, the exchange and the sharing of documents 
through different computers, systems, software and networks. 

New technologies of data representation and processing allow the use of 
image, video and sound information in computer applications. Depending on 
the targeted application, these new media types can be more or less integra- 
ted into the whole information system. For example, a video/audio channel of a 
teleconferencing application is completely independent from other information 
sources. In this paper, we are interested in applications where combining pieces 
of information from various media types into a unique entity, called a multimedia 
document^ is of high priority. Typical examples are multimedia titles on cdroms 
or web documents including synchronized video or audio. 
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This paper provides an overview of major concepts and techniques on which 
electronic documents technology is based. The first part is devoted to the de- 
scription of general concepts of documents through the identification of four 
main dimensions: logical, physical, navigational and temporal. The next one will 
focus on the management of structured documents; that area will be illustra- 
ted by the description of some authoring applications and some transformation 
techniques. Finally, in the last section we more deeply develop temporal aspects 
of documents and present new specification and authoring needs required for 
handling multimedia documents. 



2 Models for Electronic Documents 

2.1 Electronic Documents 

With the advent of hypertext, on-the-fly document generation and multimedia 
technologies, it becomes more ad more difficult to provide a clear definition of the 
notion of document. For the purpose of this talk, we will consider a document 
as a set of basic information entities semantically linked together in order to 
constitute a message. We will not discuss further where the semantic limit has 
to be put, but we will focus on the way to express the organization of basic 
information entities. 

The elementary entities that compose documents have either a static or a 
dynamic nature: static objects include strings, graphics, images or mathematical 
symbols and dynamic objects include those having a duration such as animations, 
audios or videos. The duration of a dynamic object may be intrinsic to the object 
as for audio or cannot be determined before the presentation stage: a typical 
example is an interaction button whose duration is given at presentation stage 
by a reader action (a mouse click). 



2.2 The Four Dimensions of Documents 

Roughly speaking, a document can be considered as a set of basic components 
organized according to four ways of structuration. These structuration levels can 
be considered as four independent dimensions: 

— The logical dimension (chapters, sections, paragraphs, etc.). 

— The navigational dimension (hypertext links, actions). 

— The spatial dimension (page layout, presentation, style sheets). 

— The temporal dimension (multimedia synchronization, scenario description) . 

This way of modeling documents provides an homogeneous framework for 
representing most categories of documents: from conventional documents such as 
technical reports, letters, scientific articles to graphics or hypermedia structures. 

The core of that document model is the expression of object composition in 
each dimension, as for example: 
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— Logical composition: ”A book is composed of a title, an author and a set of 
chapters, each of them being a list of paragraphs”. 

— Spatial composition: ”A footnote must be set on the foot of the page in 
which appears its first reference” . 

— Navigational composition: ”A link is created between each bibliographic 
entry and all its references”, ’’The architecture of a web site is defined by 
the HTML links between its pages” . 

— Temporal composition: ’’When a company presentation starts, its logo is 
displayed during 5 seconds, then the manager’s picture is shown during his 
speech; the end of the presentation is composed of a 3 minutes video of the 
products of the company together with a music” . 

We can notice that the composition may depend on the nature of the objects 
that are composed (for instance, a sound has no spatial position). 

Numerous models and languages have been proposed for the specification of 
these different kinds of document composition. Before going further, let’s notice 
that document portability and exchangeability can only be given by composition 
formats that are independent from any production system. Moreover reusability 
can be obtained thanks to the definition of generic models. 

In the next subsections, we describe models and representative languages for 
the composition of these dimensions. The temporal dimension will be deeply 
presented in the last section of this document. 

2.3 Models and Languages for Representing Logical Structures of 
Documents 

Models for representing logical structures of documents are based on: 

— Basic objects (that cannot be decomposed). 

— Composite objects obtained by composition of basic or composite objects. 

— Attributes associated with objects (to add semantics). 

With such a model, a document is organized as a tree structure (such as the 
tree representation of a book in Fig. in which the leaves are the basic elements 
representing the ’’content” of the document. Basic and composite objects are 
typed. 

We can notice that in traditional word processors, document structures are 
linear (basically, lists of titles and paragraphs). By opposition, documents repre- 
sented in a hierarchical and typed way are called structured documents. 



Main Principles of Generic Logical Structures. Languages for defining 
documents with such typing principles are called markup languages because the 
format intertwines type information (marks or tags) inside document content 
(basically, the text). For instance, the previous document is defined by: ” <book> 
<title> Mme Bovary </title> <author> Stendhal </author> <chapterList> 
<chapterl> ....”. 
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book 




title 



author 



chapterList 




"Mme Bovary" "Stendhal" 



chapter 1 



chapter2 





Paral Para2 ParaS Para4 ParaS 



Fig. 1. Logical structure of a book 



Due to the great variety of documents (novels, articles, letters, etc.), it was 
not possible to define an universal markup language including all types of docu- 
ments authors may create. Instead, languages, called generic markup languages, 
have been defined to specify classes of documents. These languages define gram- 
mars to which documents conform. 

These principles are nowadays widely applied thanks to the SGML, ODA 
and XML standards. 

SGML/XML. SGML |E|, Standard Generalized Markup Language, is an ISO 
standard (ISO 8879:1986) that aims at providing a formal notation for grammar 
definition of classes of documents called ’’DTI); Document Type Definition" . 

This standard not only has permitted the emergence of specific DTD adapted 
for different applications domains (GALS, TEI, HTML), but it has also been used 
for the definition of new standards: 

— HyTime jH]j (for hypermedia documents) . 

— SDML & SSML (for sounds). 

— XML |21| ((extensible Markup Language) that can be considered as an 

improvement of SGML. 

SGML/XML principles SGML provides a descriptive markup instead of a pro- 
cedural one. This allows the separation of the ’’structured-content” part from 
any information associated with specific processing (formatting, information re- 
trieving, etc.) 

As such a marking is a way to type parts of documents through grammar 
rules (as given by the DTD), typing techniques can be applied for documents: 
syntactical controls, homogeneous processing. 

The standard allows independency from character formats thanks to a string 
substitution mechanism (’’SGML entities”). 

XML situation XML is a recommendation proposed by the W3 consor- 
tium for a new markup language that aims at taking into account new needs 
for document exchange on the web (more structured documents, carrying more 
semantics). This specification is an evolution of SGML in the sense that some 
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SGML features are not allowed (mainly omitted tags and inclusions/exclusions) 
and some extensions are introduced (naming conventions for modularity, links, 
empty elements). 

A major difference between SGML and XML is that XML allows the existence 
of two kinds of documents: well-formed documents which don’t always have a 
DTD, and valid documents, which do. 

Among the subjects addressed by the W3G XML Working Group, the mo- 
deling activity has been split into 4 items: 

— Data model, the core for modeling the information contained in an XML 
document . 

— Namespaces for relating names in XML documents with Uniform Resource 
Identifiers (URIs), in order to associate the local names with global identi- 
fiers. 

— XLink (XML Linking Language) and XPointer (addressing language for 
pointing into documents), for specifying constructs to describe simple or 
complex links between objects (this is an activity of XLL Working Group). 

— Structural Schemas, for associating constraints to documents. 

The XML syntax plays a central role in the activity of W3G for defining new 
recommendations in different domains of the web. For instance, the XML syntax 
is used in: 

— Resource Description Format (RDF), the language for representing meta- 
data. 

— Synchronized Multimedia Integration Language - (SMIL) E3, for multime- 
dia documents. 

— Document Object Model (DOM), for the definition of an applications pro- 
gramming interface that allows active manipulation of the structure, presen- 
tation and content of XML and HTML documents 



Specific Structures. Each DTD defines a specific class of documents. For 
example, a DTD for describing simple books as the document of Fig. ^ could 
be defined as follows: 



<!ELEMENT book (title, author, chapterList) > 
<! ELEMENT chapterList (chapter) + > 
<! ELEMENT chapter (para)+ > 
<! ELEMENT (title I author I para) (#PCDATA) > 



Fig. 2. A simple DTD for books 



Numerous application domains have developed DTD. As an illustration, we 
list some representative SGML or XML DTD: 
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— CALS: (Computer-aided Acquisition and Logistic Support), defined by the 
American DoD for technical documentation. 

— TEL Text Encoding Initiative for encoding a wide variety of commonly 
encountered textual features in literary and linguistic documents. 

— HTML: HyperText Markup Language, which has evolved from basic text 
and hyperlinks features for the web to the HTML 4.0 Specification |2BI for 
supporting more multimedia options, scripting languages, style sheets, better 
printing facilities,. 

— ISO 12083, for scientific documents defined by the American Association of 
Publishers and the European Physical Society. 

— MathML: this W3C Recommendation 1221 is a XML low-level format for 
describing mathematics as a basis for machine to machine communication. 
It can be used to encode both mathematical notation, for high-quality visual 
display, and mathematical content, for more semantic applications 



2.4 Models and Languages for Representing Physical Structures of 
Documents 

Principles. Among the typographical properties (or presentation properties) 
that characterize the graphical aspect of documents, we can identify two subsets: 

1. Properties depending on the content to be laid out, like fonts, color or ty- 
pefaces. We call these properties the style. 

2. Properties depending on the output medium, such as the size of pages, co- 
lumns, margins and gutters; we call these properties physical structure pro- 
perties. 

The expression of presentation properties has evolved in many directions, 
from low-level commands interspersed within the text (troff. Latex ) to style 
sheets associated to documents in interactive editors (Word, Author/Editor), 
proprietary stylesheets languages (Panorama and Thot P language ^) and 
standard languages (CSS |29j . DSSSL |0| and XSL El). This evolution follows 
the evolution of document models, from weakly structured document models to 
structured document models that contain no presentation information. 

With structured documents, the formatting process produces a representa- 
tion of the document ready to be output (displayed or printed) from the internal 
representation of that document (its content and logical structure) and the as- 
sociated presentation properties. One key point of structured document models 
is their ability to associate presentation properties with document element ty- 
pes, allowing inheritance of properties based on the structural hierarchy Pj. It is 
worth noting that style properties can be easily related to the logical structure, 
unlike physical structure properties. In fact, the physical structure of a docu- 
ment can be seen as a hierarchical organization of boxes (see Fig.Ol) as defined by 
Knuth box model; therefore formatting structured documents implies merging 
two hierarchical structures: the logical one and the physical one nni. 
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Fig. 3. Hierarchical physical structure of document 



Cascading Style Sheets Language. In this part, we describe the CSS1/CSS2 
suite izni Cascading Style Sheets, the W3C style sheet languages that have been 
defined for HTML documents. CSS2 Recommendation follows and completes 
CSSl Recommendation mainly for supporting media-specific style sheets (brow- 
sers, aural devices, printers, etc.), and other high level formatting features such 
as content positioning, table layout and automatic counters and numbering. 

Basic concepts CSS is a simple declarative style sheet language for HTML docu- 
ments that allows to associate style properties not only with instances but also 
with element types so that properties can be applied to all elements of the same 
type. Moreover, CSS syntax allows to have a clear separation between content 
and presentation. 

Properties A property (color, margin, font, etc.) is assigned to a selector in order 
to manipulate its style. Example: color: red; 

Selectors Selectors can be defined by one of these three possibilities: 

— HTML element: p { text-indent: 3em } 

— Class selectors: code.new{ color: green} with class attribute: 

<code class=new> ... </code> 

— ID selectors (with ID attribute): #nb554 { font-weight: bold } 

Inheritance The inner selector inherits the surrounding selector’s values unless 
otherwise modified. But there are some exceptions. As an example, the margin- 
top property is not inherited. 

Stylesheet access Styles rules applying for the elements of a document can be 
put either directly in the head part (with a style element) or in a separate file 
(with extension .css) that is referred with a link element as: 
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<link rel=stylesheet href ="name . css" type="text/css"> 
The css file contains css rules, as for instance: 

<style type="text/css"> 



<!— h4 


\{f ont : 


14pt 


"Times" ; 


font -weight: bold; color: green\} 


h2 


\{f ont : 


16pt 


"Times" ; 


font -weight: bold; color: blue\} 


P 


\{f ont : 


12pt 


"Times" ; 


color: black\} 



— > 



</ style> 

Cascading stylesheets With such as way to access to style rules, it is possible 
that several rules set a value for the same property of the same element. The 
question is then: which stylesheet definition takes precedence? 

The basic rule is the following: the most specific rule wins. However, it is pos- 
sible to specify rules with an ”! important” statement that will override normal 
rules. 



XSL. The above example of CSS demonstrates that the principles of section 17711 
can be applied to a single tag set (HTML) for which limited display functions 
are required. DSSSL 0 and XSL |23 aim at providing a way to describe how to 
display a document marked up with arbitrary elements as defined with SGML 
or XML. Their main concepts are: 

— Declarative approach: declarative specification allows to describe characte- 
ristics and constraints to be used by the formatter. On the contrary, a pro- 
cedural approach implements the formatting process itself. 

— Basic formatting structures called flow objects (character, paragraph, se- 
quence, page, group, link, etc.) having an associated set of formatting cha- 
racteristics that are applied to those objects. 

— Tree transformation mechanism, for the transformation of documents from 
one application to another. For XSL, the target application basically is the 
formatting process of XML documents: the transformation specifies how each 
element of a tree source (a XML document) is associated with flow objects 
that compose the target tree. 

— Complete style language for expressing formatting and other document pro- 
cessing specifications. Typographic requirements range from reordering or 
duplicating elements to complex page layouts. 

XSL is based on DSSSL for its basic principles as described above, but it 
uses XML syntax. It includes CSS-like style rules and an escape into a scripting 
language to accommodate more sophisticated formatting. 

The association of elements in the source tree to flow objects is through 
construction rules composed of a pattern and an action to specify a resulting 
sub-tree of flow objects. Patterns propose a complete selector mechanism in 
order to identify applicable elements by their context within the source, such as: 
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element ancestry or descendants, attributes on an element, position of an element 
relative to its siblings. The action part of the rule describes the structure and 
the style properties of flow objects that must be created. 

2.5 Models and Languages for Representing Hypermedia Structures 
of Documents 

Principles. Links aim at representing semantics that cannot be expressed by 
structural relationships. Typical examples are notes and references in documents. 
Links can be defined inside a document (internal links) or between documents 
(external links) providing an hypertext organization of the information that can 
be used by navigation applications. The most widespread application of this 
nature is the web itself. 

The underlying model for hypertext structures is basically a graph where 
nodes represent document elements and arcs represent the links between them. 
This structure is orthogonal to the logical structure of document. 

As an illustration of these principles, we briefly describe hyperlinkings aspects 
of HyTime and XLink, the W3C proposal for hyperlinks. 

HyTime and XLink. The HyTime standard m is an SGML application (it 
uses SGML syntax) that can be used for hypertext and temporal specifications. 
Only hyperlinking facilities are described here, see section lO for the temporal 
aspects of HyTime. 

The web Consortium works on the definition of the XML Linking Langu- 
age (XLink) 1^ for the specification of links structures inside XML resources. 
HTML, HyTime and TEI P3 are the three standards that provide the ground 
material of XLL working group. More precisely, it uses the same basic concepts 
than HyTime for link specification. We have chosen the XLink vocabulary for 
presenting these concepts. 

XLink allows the specification of both simple unidirectional links (similar to 
HTML links) and complex multidirectional, typed links. 

Basically, a link is an explicit relationship between two or more local or 
remote resources that are reachable by the use of a locator. When a link is 
traversed (by a user action or by a program), a resource of the link is accessed. 

Links are defined by linking elements that can be recognized by the applica- 
tions thanks to a specific attribute named xmhlink that can take one of the two 
values: simple or extended. Other attributes can be defined to associate informa- 
tion with a linking element: role, locators of remote resource and semantics for 
local and remote resources (specific role, title and behavior when traversed). 

A locator is specified by a Uniform Resource Identifiers URI to identify the 
document together with a XPointer to point to a fragment into the document. 

3 Structured Documents Centered Applications 

In order to illustrate how the models and languages presented above are used in 
applications, we describe in the sequel one class of applications, namely editing 
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applications. We then point out new problems raised by structured documents 
and DTD management when DTD change and we show how transformation 
techniques can bring solutions to them. 

3.1 Editing Tools 

Editors based on structured models maintain in memory a logical representation 
of the document which is used for editing operations. Thanks to this information, 
the editor guides and controls the user according to the generic structure of the 
document being edited. In particular, the editor prevents the user from producing 
a document whose specific structure would not be consistent with the generic 
structure. 

With a structured model of documents, the formatting process produces a 
representation of the document ready to be output (displayed or printed) from 
the internal representation of that document (its logical structure) and the style 
and physical structure properties. 

However, an important reason that limits the use of structured document 
models in document production is the difficulty of developing an editing tool 
with both logical and physical document representations. Some tools provide 
structured editing functionalities with poor formatting capabilities while others 
provide more sophisticated formatting operations but no interactive manipu- 
lation (e. g. LaTex). Mixing complex formatting functionalities together with 
structured document models into an interactive authoring environment is still 
an open problem HS|. 

Thot Editor. Thot J7] is an experimental authoring system developed 
by Opera project in order to validate the concepts of structured document into 
an interactive environment. 

Thot is a system designed to produce structured documents. It allows the 
user to create, to modify and to consult interactively documents that comply 
with models. These models permit the production of homogeneous documents. 
Formatting and typography are handled by the system: the user can then focus 
on the organization and on the contents of documents. Thot performs other 
operations for the user such as numbering, updating cross references, building 
index tables, etc. 

Thot is an integrated and extensible system. It allows to process with the 
same tool and within the same document not only structured text but also 
graphics, complex tables, mathematical formulae, etc. 

Thot is also an open system. It is able to exchange documents with other 
systems through a flexible exporting tool, for example, to convert documents 
into Latex and HTML. It can also be included in other applications through its 
programming interface. 

Amaya Editor. Amaya ^H| is the W3C test-bed browser/ authoring tool that 
is used to demonstrate and test many of the new developments in Web protocols 
and data formats. 
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It has been developed on top of Thot technology taking advantages of its 
features such as structure management, multiviews display, multiple presenta- 
tion handling (for screen and paper). But Amaya is much more than a simple 
editing tool, it is a complete web browsing and authoring environment for web 
documents. For instance, a transformation service is included in the tool, allo- 
wing the author to change the structure of some parts (lists into tables) or to 
easily edit mathematical expressions by successive structure changes. 

Amaya demonstrates recent web standards such as: (1) a support for CSS 
m which allows Amaya to display documents with style sheets and to create 
or edit style sheets; and (2) a prototype implementation of MathML | 22 | which 
allows users to browse and edit web pages containing mathematical expressions. 

3.2 Transformation of Structured Documents 

A major drawback of structured documents comes from the basic principle: each 
document must have a specific logical structure which is consistent with the 
corresponding generic structure. This implies that: (1) any change in a DTD 
can have heavy consequences on existing document bases and (2) any change in 
a document can be done only if the generic structure allows it. In both situations, 
transformations have to be performed. 

DTD Mauagemeut. The logical structure of a document type can evolve. 
For various reasons it may be necessary to declare new elements, to remove 
elements that have become useless in some type of document, or to arrange 
existing elements in a different order. These changes lead to new versions of 
generic structures and the user has to specify into which new type each old type 
has to be transformed. 

The problem is then to recover documents built with old versions of a generic 
structure that has evolved. As a number of such documents may exist, it is 
necessary to transform them automatically, for making them consistent with the 
new generic structure. This kind of operation is called a static transformation 
because it is usually performed outside an editing session. Filters are typical 
tools that are used in such situations but they: 

— need a specific development for each DTD transformation, 

— require an exhaustive description of the translation of each type, 

— and imply either simple expressions which only allow limited transformations 
(such as Balise 0 and Cost 0 tools) or complex expressions 0 which lead 
to powerful transformation. 

Another approach to the transformation of DTD is the automatic one. This 
approach is based on the comparison between the document to be transformed 
(source) and the target DTD, using a matching algorithm to find a relation 
between the structures E0|. However, pure automatic techniques are unable to 
provide the right results in some situations. Therefore, we study an approach, 
called semi-automatic transformation, which tries to get the advantages of both 
filters and automatic transformation. 
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Editing Structured Documents. One limitation of current structured edit- 
ing systems comes from the structural constraints on documents that can be 
considered too rigid by users. For example, the familiar cut-and-paste command 
that allows the user to copy or cut a part of a document (the source) and to 
insert (or paste) it into another part (the target) of a document cannot be easily 
implemented in an interactive structured editing system. Moreover, the system 
must allow these types to be defined in different document models, when source 
and target elements are in different documents. 

To allow a cut-and-paste operation when types are different, the structure of 
source element must be transformed to become consistent with the target gene- 
ric structure. Usually, the user wants this transformation to be automatic when 
editing a document, as when he uses an unstructured editing tool. However, he 
may want to indicate his preferences when several transformations are possible. 
This kind of transformation performed by an interactive editor is called a dyna- 
mic transformation. This problem is similar to type conversions as considered in 
programming languages or object-oriented databases. 

The main constraints that have to be taken into account when implementing 
a dynamic transformation tool are the following: 

1. The cut-and-paste operation must not lose any information while keeping as 
far as possible structural information. 

2. The types involved in the operation can be any types known in the system, so 
no pre-processing can be performed as in static transformations (see above). 

3. Performances are critical as the operation is interactive. 

Few studies have been made on the specific problem of document types trans- 
formations in interactive environments. The second constraint stated above has 
lead us to explore an automatic transformation technique j2Dj . The automatic 
approach is based on the comparison between the document to be transformed 
(source) and the target DTD, using a matching algorithm to find a relation 
between the structures. 

4 Multimedia Documents: Prom Temporal Specification 
to Authoring Environments 

A multimedia document is defined as a set of (basic) objects spatially and tem- 
porally organized and on which a navigational structure can be set. Multimedia 
documents combine in time and space different types of elements like video, au- 
dio, still-picture, text, synthesized image, ... Compared to classical documents, 
multimedia documents are characterized by their inherent temporal dimension. 
Basic media objects, like video, have intrinsic duration. Furthermore, media ob- 
jects can be temporally organized by the author which adds to the document a 
temporal structure called the temporal scenario. Such an entity can be rendered 
thanks to a presentation engine by means of the output channels of the computer 
(screen and speaker). 
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Today, authors of multimedia documents have often to be programmers be- 
cause it is the only way for them to specify the complex synchronization of 
their documents (Lingo scripts in Director [TJ documents for example). But it 
is clear that in order to increase the popularity of such multimedia applicati- 
ons, computer-illiterate people must have direct access to multimedia document 
creation. That will also drastically reduce production cost of multimedia titles. 

Within the past decade, numerous research works (Cmifed gg, Firefly g|, 
HTSPN 25, Isis 2Sji Madeus 23 )i presented various ways of specifying 
temporal scenarios, focusing on a particular understanding of temporal synchro- 
nization. Some standards have also been defined for covering temporal specifica- 
tion needs: HyTime 23? MHEG 23 and SMIL |5D] are the most representative 
examples. Before describing them, we analyze what are the main features that 
are required for multimedia documents environments. 



4.1 Multimedia Authoring Requirements 

The variety of multimedia approaches reflects the large number of requirements 
that have to be covered by a multimedia authoring system. But these needs are 
only partially fulfilled by existing applications. In order to give a structured and 
readable analysis, we only focus on authoring requirements. We group them in 
two main classes: expressive power and authoring capabilities. 

Expressive power The expressive power of an authoring system is somehow re- 
lated to the ability of the system to cover a broad range of temporal scenarios 
required by the author. This criterion is hard to measure since defining an ac- 
ceptable level of expressive power is strongly dependent on author practice and 
experience. Authoring requirements can be classified into three sets: (a) the 
needs arising from the intrinsic nature of the objects composing multimedia do- 
cuments, (b) those arising from their composition and Anally (c) those related 
to hypermedia navigation. 

(a) A multimedia system must be able to handle a wide variety of basic ob- 
jects (text, sounds, images, videos, etc.) on which the author can set interactivity 
capabilities and temporal style definitions. 

(b) As far as expressive power is concerned, temporal composition aims at 
expressing any arbitrary ordering between temporal intervals corresponding to 
the different objects p. 

(c) Hypermedia navigation (see E3D is performed through document inter- 
actions that can either be global interactions (like usual hyperlinks) or local 
interactions (the effect applies on a sub-part of the objects). 

Authoring capabilities At this point, the relevant question is how long does it take 
for an author to design a scenario? Authoring capabilities enclose the following 
criteria: 

— Adaptability to computer illiterate people; 
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— Straightforward design of temporal composition, for example by allowing the 
user to specify in any order the temporal relations; 

— Adaptability to the incremental nature of the editing process, i.e. local mo- 
difications must have local consequences; 

— Abstraction and multimedia document models capabilities to help the author 
in the organization of his document (structuration) and to allow reuse parts 
of documents or templates; 

— Multigrids reading support for the access of the same document by different 
categories of readers (having different native languages or comprehension 
levels) . 

One important research activity is the definition of good user interfaces for 
providing real end-user authoring tools. A good authoring environment will cer- 
tainly not result by simply packaging an existing programming language: not 
only the author has to deal with too much low level specifications, but also such 
authoring tools still provide slow development cycles thanks to the composition- 
test process (as with MhegDitor which is based on a converter tool jS|). To 
break down this batch approach, the experiences gained with authoring static 
documents (see section 1^^ can be considered: the Wysiwyg paradigm has been 
proven to be the right basis on which editing interfaces have been built. Ho- 
wever, such a paradigm cannot be directly apply inside multimedia authoring 
applications due to the temporal dimension of multimedia documents. 

In order to provide the author with good multimedia authoring tools, i. e. 
close to the Wysiwyg paradigm, it is necessary to allow some way of direct 
manipulation of the document in the presentation view (the display area where 
the document is played) . However, such a direct edition has to be completed by 
other features acting on the presentation process (stop/resume) or given through 
new visual perception mechanisms in order to provide the author with some 
global perception of the document. Moreover, the author needs more flexible 
ways to navigate in the document, such as: going faster until some important 
parts of the document, jumping from a relevant point to another one, etc. Such 
features must be provided by high level temporal access functionalities such as: 
direct time point access and different scales of fast forwarding and rewinding. 



4.2 Multimedia Languages 

Multimedia languages can be classified in two main categories, operational and 
constraint-based ones, that reflect on how close the document description is to 
the presentation level: 

1. Operational approaches are based on the direct specification of the tempo- 
ral scenario of the document. The author specifies how a scenario must be 
executed: based on either a script language or an operational structure (tree 
or Petri-nets are good examples). Therefore the presentation phase directly 
implements the operational semantics provided by the used structure. All 
existing standards belong to this class of languages. 
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2. Constraint-based approaches set the specification outside this operational 
scheme. They are based on constraint programming and are characterized 
by a formatting phase that computes starting times and durations, as requi- 
red by the scenario. This formatting phase can be seen as a compilation of 
a declarative specification into an operational structure, which can be inter- 
preted by the presentation phase. Thus, the author specifies what scenario 
he needs without involvement of how to get the result in terms of operational 
actions, in a declarative way. 

In a previous paper El, we have shown that constraint-based approaches 
seem to be more adapted for building powerful authoring tools and they can 
offer equivalent or higher expressive power capabilities than operational tech- 
niques: the author has not to give the duration of all the objects involved in 
his document. The durations are computed by a temporal formatter, removing 
the burden of this task from the author and allowing him to obtain reusable 
scenarios. However, this formatting has to be time-efficient and must provide 
the solutions desired by the author. 



HyTime. With HyTime, temporal specification is expressed by placing tem- 
poral events (begin and end instants of elements) on an absolute temporal axis. 
Such an approach is relevant only if objects have a deterministic temporal be- 
havior otherwise it is not possible to define their temporal events in such an 
absolute way. 

The temporal specification of any basic object (text, video or audio) is con- 
sidered as one dimension of its Finite Coordinate Space (other dimensions can 
specify spatial positions). Time measurement can differ from one FCS to the 
other. 

HyTime is interesting by its integrated approach of temporal, spatial and 
hypermedia dimensions of documents. But its intrinsic complexity and its weak 
temporal composition capabilities prevent the development of tools and applica- 
tions based on it. Tit is however worth noting that the best successful concepts 
of HyTime, namely hypertext specifications, have been reused in other standards 
such as XLink (see 12 . 011 . 



SMIL. SMIL (Synchronized Multimedia Integration Language) j3D| defines a ge- 
neral document format integrating different types of independent media objects. 
It illustrates operational approaches based on a tree structure. The organization 
of media objects in the document is given in terms of temporal composition: 
both sequential and parallel operators are available together with synchronized 
attributes that be used to specify fine synchronization between objects. 

SMIL format is defined as an XML DTD and hyperlinking follows XLink 
specifications. A SMIL document is composed of two parts: the Head part that 
contains information at document level (basically the spatial organization in 
terms of Regions) and the Body part that contains the document scenario. A 
scenario is a hierarchical structure of parallel or sequential schedules. 
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The sequential operator expresses the sequential play of the set of children 
objects. The attribute Loop can be used to specify a given number of iterations 
of sequential structure. 

The parallel operator expresses the is simultaneous play of its operands with- 
out any constraint on the operand termination: by default, the end time of the 
construct is defined by the maximum duration of the enclosed elements. This 
semantics can be changed with the use of the temporal attribute Endsync. For 
instance, if Endsync=first, the duration is defined by the minimum duration of 
the children (the others will be interrupted). 

The following example illustrates basic concepts of SMIL and main syntactic 
features: 

<smil> 

<head> 

<layout type="text/ smil-basic"> 

<region id="title" left= ... /> 

<region id=" image" ... /> 

</layout> 

</head> 

<body> 

<seq> 

<par id="A" endsync="last"> 

<audio id="P" dur="20.0 s" 

src="http : //www. inria. fr/music . au"/> 

<text id="Name" region="title" dur="5.0 s" 

src="http : //www. inria. fr/text .html"/> 

<img id="Hello" region="image" 

src="http : //www. inria. fr/hello .gif " dur="10.0 s"/> 
<a id = "HI" href="#Next" show="replace"> 

<img id="Button" region="xx" 

src="http : //www. inria. fr/button. gif " /> </a> 

</par> 

<video id="V" region="yy" src="http : //www . inria. fr/v.mpg"/> 

</ seq> 

<par id="Next"> <! — Next part of the scenario — > 



Since its public availability, SMIL is been implemented by numerous vendors: 
new SMIL players are announced (such as RealNetworks and CWI) and first 
authoring tools begin to appear (such as VEON authoring tool). 



5 Conclusion 

The multimedia authoring domain is still in its infancy but lets bet that it will 
expand considerably very soon. New standards such as SMIL should give a new 
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boost to this domain. Taking into account the distribution of multimedia objects 
will become a great challenge in the years to come. 

Another challenge is the emergence of solutions for providing authoring envi- 
ronments that allow the specification of the different dimensions of documents. 
The experiences gained with structured editing tools and multimedia environ- 
ments have to be merged for providing new solutions characterized by: 

— the tight-coupling of authoring and presentation functions allowing some 
forms of direct edition; 

— a way to allow the author to access and define each dimension of the docu- 
ments through several views. Views synchronization can be very helpful to 
provide accurate perception services on documents; 

— and the ability the let the author adapt navigation scales in the time space. 
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(Extended Abstract) 



1 Introduction 

Software has become an indispensable part of most products and services. As a result 
the need to "engineer software" professionally with high quality at low cost has 
become important to all branches of industry. The supporting scientific discipline 
called "software engineering", on the other hand, has matured very slowly, and has 
only just now arrived at the verge of making a real contribution to truly 
professionalizing the "engineering of software". This presentation reviews the historic 
evolution of both the profession of "engineering software" as well as the scientific 
discipline of "software engineering", points out their symbiotic relationship, and 
closes with an outlook into a visionary future full of challenges for practitioners, 
researchers and teachers. 



2 The Profession of Software Engineering 

Today, most products and services of our daily lives depend highly on complex 
software. That means that product or service quality is impossible without software 
quality. This situation has led to increasing pressure on the profession of "engineering 
software" to transform quickly from a toy discipline (i.e., one hacked software for 
one’s own use) to a development discipline (i.e., one makes money by selling high- 
demand software without being held responsible for low quality), all the way to an 
engineering discipline (i.e., quality of software is treated like quality of any regular 
engineering product). 

In consequence this means that a sound scientific basis is needed for describing 
software products (i.e., software programming languages), for developing software 
(i.e., software development methods), for coordinating and managing software 
development (i.e., software development processes), and for assuring the desired 
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qualities of software and improving over time (i.e., quality assurance and 
management approaches). 

Key ideas used in the professional software engineering environment were (in 
historical order) 

• Software (or programming) languages (since 50’s): low-level to high-level 
languages, implementation to design and specification languages, unstructured to 
structured languages, general to application- specific languages, etc. 

• Software development methods (since late 60’s): informal to formal/systematic 
methods, homolythic to scaleable methods both wrt. complexity and formality, 
etc. 

• Software development processes (since late 70’s): life-cycle project models to 
technical process models, isolated (individual) to integrated (team) process models, 
static to dynamic process models, etc. 

• Quality assurance and management approaches (since late 80’s): qualitative to 
quantitative quality assurance, subjective to objective management, improvement 
by chance to TQM for software, etc. 

In the presentation a more detailed review of the professionally used key ideas and 
technologies will be given. 

The main problem still today is that the useful integration of all these ideas and 
technologies into a competence that contributes to solving the engineering problem of 
a specific company is not well understood. There exist numerous success and failure 
stories. However, there is little (re)usable knowledge why a specific language or 
method worked better or worth in different environments. We - as a community -are 
over and over surprised if methods proven to work in one environment do fail in a 
different environment. The main reason for this surprise is indeed a fundamental 
misunderstanding of the task of "engineering software". Professional software 
development environments reacted intuitively appropriate by not introducing many of 
the existing - theoretically promising - research languages and methods into practice - 
without however understanding its deeper reason. 



3 Characteristics of the Software Domain 

What is software engineering like? What can we learn from physics, manufacturing or 
social sciences? In truth, software engineering combines characteristics of all of the 
above. It is by nature an engineering discipline; however is different from 
manufacturing in that it is a "design" rather than a "production" task, and is mainly 
involving human-based processes. There exist many natural science "laws" about the 
relationship between process and product; however, most of them need to be 
empirically validated. Finally, like in social sciences, many of the relationships cannot 
be explained without modeling the human problem solving process. All this explains 
why one-dimensional approaches like "mathematical transformation ideas" or 
"management-based approaches" in isolation had to lead to disappointing results. In 
the presentation a characterization of the software domain will be given and 
contrasted with other traditional disciplines. 
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4 The Scientific Discipline of Software Engineering 

How has the software engineering discipline (established 1979) responded to these 
industrial experiences and problems? For a very long time, the discipline evolved in 
rather independent parallel threads: 

• formal methods community: formal specification techniques, formal languages, 
formal verification, formal transformation, etc. 

• system modeling community: architectures, product line approaches, object- 
orientation, reuse frameworks, etc. 

• process community: methods, process models, process standards, life-cycle 
models, process-sensitive development environments, etc. 

As shown by their practical usage in industry, neither one of these communities was 
able to elevate the engineering of software to a satisfactory "engineering level". When 
the characteristic of software development - as characterized above - was slowly 
understood, qualitative changes started to happen. The understanding growing out of 
the realization that engineering of software is a human-based design process made it 
obvious, that the creation of software with high quality under changing environmental 
characteristics required the choice of different languages, methods and or processes. 
This in turn required an understanding of which language, method or process 
promises what result under what environmental characteristics. Now many people 
realized that we had all engaged in producing new languages, methods and processes 
without understanding their effects. 

As a result the importance of empirical studies as an important sub-discipline of 
software engineering emerged. This has not only led to an addition to the already 
existing three sub-disciplines, but also to a synergistic whole. Today, more and more 
people, consider the software engineering discipline as fundamentally "experimental" 
and composed of 

• formal methods community (see above) 

• system modeling community (see above) 

• process community (see above) 

• empirical studies community: experimental designs, quantitative methods, 
quantitative and qualitative modeling, etc. 

The experimental characteristic of our discipline requires the use of empirical studies 
to identify strengths and weaknesses of existing approaches in objective terms, derive 
potential for improvement, and evaluate the potential of new languages, methods and 
processes against such improvement goals. In the presentation, an overview of 
existing research results the thin traditional three sub-communities will be given, and 
the need for changing the research paradigm will be motivated. 



5 The Experimental Software Engineering Paradigm 

The empirical studies community has produced principles, methods and tools for 
planning and conducting empirical studies in software engineering. Fundamental 
contributions include methods for defining study goals, designing the appropriate 
experiments, quantifying observations and modeling phenomena based on 
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measurement data. In addition, languages, methods and tools have been developed for 
representing software engineering knowledge for reuse. In the presentation a short 
summary of the existing empirical studies methodologies will be provided. 

In order to illustrate the possible improvements to be gained by living the 
experimental software engineering paradigm in both research and practice realistic 
examples will be provided. The research example comprises the grown understanding 
regarding software inspections; the practice example comprises the improvements 
which have been gained within the NASA SEL development environment. 



6 Outlook 

Finally, a vision of "software engineering" both as a profession and as a scientific 
discipline will be painted. Within this vision, the challenges for practitioners, 
researchers and teachers/trainers will be pointed out. 
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Abstract. We give a survey of recent theoretical results for communi- 
cation problems in point to point networks. This survey is based on the 
previous surveys in fPYtZP) . 

Communication problems are studied as routing path systems satisfying 
given communication patterns in a network. Efficiency parameters of 
path systems such as congestion, dilation, stretch factor, compactness 
and buffer size are considered. We focus on the current research directions 
and the various techniques that are used. Open problems related to this 
line of research and an overview of several related research directions are 
given. 



1 Introduction 

Communication among the processors in a computer network is a fundamental 
task in distributed computing. Networks such as wide-area networks are typi- 
cally sparse point-to-point networks, consisting of a large number of processors, 
where each processor can directly communicate with only a few neighbors. Tele- 
communication networks, computer networks or the Internet are examples of 
networks that perform many communication requests simultaneously (such as 
e-mails, account transactions or telephone calls.) So the efficiency of communi- 
cation operations has crucial impact on the effective performance of the whole 
distributed network. 

It is not surprising that great emphasis is devoted to the study of basic com- 
munication problems. Among the fundamental problems are efficient routing, 
broadcasting and gossipping in point-to-point networks. All these problems are 
currently being studied actively. 

In this survey we study communication problems in point-to-point networks. 
Communication problems are investigated as routing path systems satisfying 
given communication patterns in a network. This enables us to express broad- 
casting, accumulation, gossipping and permutation routing by one-to-all, all-to- 
one, all-to-all and 1-relation patterns, respectively. 

The quality of path systems is evaluated according to certain efficiency mea- 
sures. We focus primarily on efficiency parameters such as congestion, dilation, 

* This research has been partially supported by VEGA 1/4315/97. 
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stretch factor and buffer size. They are independent from any concrete imple- 
mentation of path systems in the network. We exploit the relationship among 
these parameters for various path systems satisfying several significant commu- 
nication patterns, both on general networks and on some special interconnection 
networks, including tori, hypercubes, cube connected cycles, butterflies and star 
networks. This approach leads to a variety of interesting combinatorial problems 
on path systems and their properties. 

Another important issue is related to the space efficient implementation of 
the path systems satisfying certain communication patterns on networks. We 
consider two compact schemes: interval routing schemes (IRS) |4tif5h] and multi- 
dimensional interval routing schemes (MIRS) j1 f)j . To measure space complexity 
of these compact schemes, we use the compactness measure. We relate the com- 
pactness to dilation, congestion and buffer size for IRS and MIRS on general 
networks and on certain well-known interconnection networks. We present some 
classical efficiency results (also summarized in [1.57125] 1 and the most recent com- 
plexity results in the field of space-efficient communication schemes. The main 
focus is on the current research directions and the various techniques that are 
used. Open problems related to this line of research are given as well as several 
related research directions. 

2 Communication Problems 

Networks. An interconnection network is modeled by an undirected graph 
G = {V,E), where R is a set of nodes and A is a set of edges of the network. 
Assume \ V\ = n. Each node has a finite set of buffers used for temporarily storing 
messages. 

Communication Patterns. A communication request is an ordered pair of 
nodes (u, w) G Rx R, m fy -u. A communication pattern P is a set of communication 
requests, i.e. V = {(m, u) | u,v G V,u ^ n}. A set of communication patterns is 
where each is a communication pattern. 

We shall consider several significant communication patterns in G. 

— A one-to-all communication pattern V = {{v,w) \ w G V,w ^ v} for a given 
source node v. 

— An all-to-all communication pattern Va = {(f, w) \ v,w G V,w ?;}. 

— A k-relation communication pattern Vk in which each node is the source and 
the destination of at most k requests. A permutation pattern is a 1-relation 
Pi. 

In static setting we consider static one-to-all and fc-relation communication pat- 
tern {V}, where P is a one-to-all or fc-relation communication pattern, respec- 
tively. Similarly, in dynamic setting we have 

— dynamic one-to-aZZ communication patterns where 

'pG) = {(i;^^) \ w gV,w ^ v} for some (not fixed) source node v. 

— dynamic k-relation communication patterns {P^*^}igj. 
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Path Systems. Let p{u, v) denote a directed path in G from the node u to 
V, which consists of consecutive edges beginning at u and ending at v. A path 
system of G is a set of directed paths between nodes in G. A path system TZ 
satisfies the communication pattern V if there is at least one routing path in TZ 
beginning in u and ending in v for each communication request (u,v) G 'P. 

We distinguish between single and multipath systems. A path system TZ 
satisfying a given communication pattern 'P is single path (deterministic), if 
there is exactly one path p{u,v) in TZ for each {u,v) G 'P. It is multipath, if 
there can be many paths from m to u in 7^ for each (u,v) GP’. A path system 
is simple ( cycle- free ) if no routing path contains the same node more than once, 
and it is a shortest (optimal) path system if for each request (u,v) G 'P only 
shortest paths from m to w in G are considered. A path system TZ satisfying T’ is 
all shortest (all optimal) path, if it contains all shortest paths between u and v 
for each {u, v) G T’. 

A cycle-free multipath system TZ is oblivious, if for each p{u,v),p{w,v) G 
TZ, u ^ w, where p{u,v) = p{u,x)p\{x,v) and p{w,v) = p{w , x)p 2 {x , v) , also 
p{u, x)p 2 {x, v) GTZ and p{w, x)pi{x, v) G TZ. 



Communication Problem. Let G be a network and P a communication pat- 
tern in G. The communication problem is specified by G,P’. A scheme for the 
communication problem given by G, P is an implementation of a path system 
satisfying the pattern P in G. In this overview we shall consider only two kinds 
of compact schemes: interval routing schemes and multidimensional interval rou- 
ting schemes. 

3 Efficiency Parameters 

Let G be a network, P a pattern, and TZ a path system satisfying P in G. In 
this section we study path systems satisfying the given patterns with respect to 
dilation, congestion and deadlock-free. 



3.1 Dilation, Stretch Factor 

The efficiency of a path system is usually measured in terms of its dilation or 
stretch factor. The (worst case) dilation of TZ, denoted as dilation(JZ), is the 
length of the longest routing path in TZ. The (worst case) stretch factor of TZ, 
denoted by stretch{TZ), is the maximum ratio between the length of the routing 
path in TZ and that of the distance between their endpoints. 

Now consider single path systems satisfying the all-to-all communication pat- 
tern. 

The average dilation of TZ is 



1 

n(n — 1) 



U^V 
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for p{u,v) G TZ, where \p{u,v)\ is the length of the path p{u,v). 
The average stretch factor of TZ is 



1 \p{u,v)\ 



n(n — 1) ^ distance(u.v) 

u^v 



for p{u, v) G TZ, where distance{u, v) is the distance between u and v in G. 



3.2 Congestion 

For an edge e G E, the edge-congestion 7 t(G, V, TZ, e) is the number of paths in TZ 
containing e. The maximum congestion of any edge of G in the path system TZ is 
called the edge-congestion of G in the path system TZ satisfying the pattern 'P, i.e. 
tt{G,'P,TZ) = maXeeE'^{G,'P,TZ,e). tt{G,'P) denotes the minimum congestion of 
G in any path system TZ satisfying the pattern 'P. 

Lemma 1. 

— There exists an n-node network G and a pattern P such that for each obli- 
vious single path system satisfying P the following holds: 

71 

AG,P.TZ) > -7t{G,P). 

— There exists an n-node graph G and a pattern P such that for each shortest 
path system P satisfying P the following holds: 

T) 

7t{G,P,TZ) > -7t{G,P). 



Competitive Ratio. A general framework was introduced in |H| to deal with 
congestion issues in dynamic setting. Given a set {Pi}i^i of communication 
patterns, a path system P and a set of path systems S (all satisfying Pi for all 
i), P is said to be c-competitive with respect to S if 



maxi^i 



7t{G,P,P,) 

n{G,P',Pf) 



P' is a path system in S 



< c 



The competitive ratio relates the behaviour of P with respect to any other 
path system from S on all communication patterns in {Pi}i^i- 

The natural question is how much one loses using oblivious or shortest path 
systems with respect to unrestricted paths systems. Due to the previous lemma 
we see that there exists an n-node graph G and a set of communication pat- 
terns such that any oblivious single path system is at least n/2-competitive with 
respect to unrestricted path systems. 
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3.2.1 Edge and Vertex Forwarding Indices. The congestion of the path 
systems has been extensively investigated in the literature. In case of all-to-all 
patterns, it corresponds to the notion of edge forwarding mrfex introduced in |^. 
Formally, for the all-to-all pattern Vaj '^{G,Va) is called the edge-forwarding 
index oi G. A similar vertex forwarding mrfea; takes into account the load of nodes 
in a network | 7 ]. Various results on the minimization of the forwarding indices 
for various interconnection networks have been obtained in lbtFil33l4,^l47H^ . 

The following theorem gives edge-forwarding indices for cycles C„, complete 
bipartite graphs Kn^m, hypercubes Qd, cube connected cycles GGCd, butterflies 
BFd, De Bruijn graphs DBd and d-dimensional tori 



Theorem 1. m The edge-forwarding index of 



- Cn is ^ for n even and ^ ^ ^ for n odd, 

\ • 2(n^+m^+nm— n— m) 

- Kr, rr), U > TTL, IS — 

- Qd IS 2^ 

- GGGd is |d22'^(l-o(l)), 

- BFd is |rf22'^-i(l + o(l)), 

- DBd is d2<^-\l - o(l)), 






maxi<i<d- 






where 7t(C'„.) is the edge-forwarding index of a cycle Gm- 



We say that TZ is of optimal edge-congestion^ if tt{G,Va,TZ) = tt(G,Va), i-e. 
edge-congestion of TZ is equal to the edge-forwarding index of G. The importance 
of determining the exact values of forwarding indices is that they form lower 
bounds on the congestion for restricted path systems (e.g. path systems induced 
by IRS or MIRS schemes in Section 4). 



3.3 Deadlock Free 

Given a source, a destination and a current buffer of a message, a buffer reser- 
vation controller specifies a set of buffers to which the message may move in the 
next step. The message can move to any of the specified buffers, provided that 
they are available. (It is assumed that each buffer is large enough to hold exac- 
tly one message.) A deadlock is a situation in which a set of messages can never 
reach the destination, because specifled buffers of each message are occupied by 
other messages from the set. A buffer reservation controller is deadlock-free if it 
does not allow the occurrence of a deadlock. 

An orientation DG of G is a directed graph obtained from G by replacing 
each undirected edge in G by an arc (i.e. an edge {u, u} is replaced by either 
(u,u) or (v,u)). An orientation is acyclic if it does not contain a cycle. An 
(alternating) orientation cover of a path system 7?. is a sequence of (alternating 
dual) orientations DGi , ..., DGg such that every path p G TZ can be expressed as 
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a concatenation of s paths pi, ...,ps, where pi is a path in DGi for all i. An acyclic 
orientation cover is an orientation cover consisting of only acyclic orientations. 

Given a network G, let TZ be an all-to-all shortest path system of G. If there 
is an acyclic orientation cover for TZ of size s, then there exists a shortest path 
deadlock-free routing algorithm using only s buffers per vertex m- A routing 
algorithm using this strategy is said to be based on acyclic orientations. If there 
are no restrictions on the strategy used by the routing algorithm then it is said 
to be based on general strategy. 

3.3.1 Buffers. The following theorems provide necessary and sufficient con- 
ditions for the creation of deadlock-free packet routing algorithms, which are 
oblivious (i.e. every message is forced to take a single, fixed path based on its 
source and destination nodes). 

Theorem 2. m Given any oblivious packet routing algorithm, if there exists a 
total ordering of the buffers such that every message is always allowed to move 
to a higher ordered buffer, then the algorithm is deadlock- free. 



Theorem 3. Given any deadlock-free oblivious packet routing algorithm, 
there exists a total ordering of the buffers such that every message is always 
allowed to move to a higher buffer. 

A great deal of research has been devoted to creating efficient deadlock-free 
routing algorithms llbll()lllll2ll3ll4l30b8l42iilti2fe3L™ . 



General Networks. The size of deadlock-free controllers for the optimal (shor- 
test path) packet routing on arbitrary networks strongly depends on the struc- 
ture of communication patterns. The following fact for all-to-all communication 
patterns can be found e.g. in m and is a consequence of a proposition proven 

in jn2|- 

Theorem 4. For any n-node network G and a set of n{n — 1) shortest paths 
connecting every pair of nodes in G, there is a deadlock-free controller of size 
D -\- 1, where D is the diameter of G. 

The best lower bound on the size of general deadlock-free controllers is 
l7(loglogn) [^. However, this lower bound is proved on a rather artificially con- 
structed network. The best lower bound on the size of deadlock-free controllers 
for well-known interconnection networks is only 3 CDI. It would be interesting 
to find better upper and lower bounds on the number of buffers required for 
shortest path deadlock-free routing on general networks. Another interesting is- 
sue is to investigate the structure of networks having large size deadlock-free 
controllers for shortest path deadlock-free packet routing. 

Considering all-to-all communication patterns on arbitrary networks, an in- 
teresting problem is to determine non-constant lower bound on the size of a 
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deadlock-free controller (based on acyclic orientation covering concept) neces- 
sary for the optimal packet routing on well-known interconnection networks. 

However, if we assume static one-to-all communication patterns, the requi- 
rements for the size of deadlock-free controllers are much lower. Namely, for 
any network G and a set of n — 1 shortest paths connecting a source with all 
other nodes in G, there is trivially a deadlock-free controller (based on acyclic 
orientation covering) of size 1. 

For other types of communication patterns the problems are again unsol- 
ved. What is the number of buffers sufficient to realize fc-relation (permutation) 
communication patterns ? Can we do better than D + 1 buffers per node ? 

Specific Networks. We shall now concentrate on specific networks. All-to-all 
shortest path deadlock-free routing algorithms with constant number of buffers 
per node are known for many important networks including meshes m , tori 
m , trees m. hypercubes de Bruijn jH] and shuffle-exchange P] networks. 

Now consider the d-dimensional hypercube. Each node consists of a binary 
string of length d with two nodes being connected if and only if they differ in 
exactly one bit. Thus every path in the hypercube corresponds to a sequence of 
changes of some bits. If the bits are changed in order from left to right, then the 
path is called monotone. 

Theorem 5. |0| Any deadlock-free ’’dimension- order” routing algorithm on an 
d-dimensional hypercube Qd uses at least f -I- 1 buffers. 

We concentrate on comparing general deadlock-free controllers versus dead- 
lock-free controllers based on acyclic orientation coverings with respect to size. 

Theorem 6. m Let TZ be an all-to-all shortest path system of a d-dimensional 
hypercube Qd with only monotone paths. Every orientation cover of TZ has size 
fl{d/ log d). 

This recent result is an improvement over |3S|, where weaker lower bound in 
the form Q{^fd) on the size of acyclic orientation covering was proved on the 
same path system as in Theorem O An important consequence of this result is: 

Corollary 1. Every shortest path deadlock-free packet routing algorithm on 
CCCd based on acyclic orientations requires fi{d/ logd) buffers. 

It is interesting to observe that there exists a shortest path deadlock-free 
routing algorithm for TZ (from Theorem El) using only 8 buffers per node (which, 
of course, is not based on acyclic orientations !). In fact, in a graph of size n 
was presented, which has shortest path deadlock-free packet routing algorithm 
using only 0(1) buffers per node, but every shortest path deadlock- free routing 
algorithm based on acyclic orientations requires l7(logn/loglogn) buffers per 
node. Hence, the technique based on acyclic orientations sometimes does not lead 
to size optimal deadlock-free packet routing algorithms. It would be interesting to 
know how large the gap between routing algorithms based on acyclic orientations 
and general deadlock-free routing algorithms can be. 
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Covering Problem. The following covering problem on the path systems is 
important for specifying deadlock-free controllers based on acyclic orientations. 
Given a network G, determine the size (denoted as rank in of alternating 
acyclic orientation covering for the system of all shortest paths between all pairs 
of nodes in G. 

This covering problem has been studied in where it was shown that to 
determine the rank is NP-complete in general. Furthermore, in 0 some known 
upper and lower bounds on the rank were improved for particular topologies, 
such as grids Gp^q, tori Tp^q and hypercubes Qd- 

Theorem 7. ^ 

- [(2 - y%q\ - 1 < rank(Gp^q) < fg + o(g) for p> q 

- Lf J + 2 < rank{Tp^q) < \{~\ + 4, for p > q 

- r < rank{Qd) < d -I- 1 

We also present upper and lower bounds on the rank for cube connected 
cycles CGGd and butterflies BFd- 

Theorem 8. 

- f2{d/\ogd) < rank{GGCd) <2d+a> 

- 3 < rank(BFd) < 4 

It would be worthwhile to establish the exact values for g x g grids (the 
conjecture is (2 — V2)q 0j), d-dimensional hypercubes (the conjecture is d 0) 
and d-dimensional cube connected cycles (the conjecture is 2d-|-0(l)). The main 
unresolved problem is to determine rank values for other well-known interconnec- 
tion networks and also for more general classes of networks. 



Greedy Controllers. Now consider greedy deadlock-free controllers. To intro- 
duce greedy controllers, we need to recall the definition of path covering. We say 
that an acyclic orientation sequence Q = {DGi, DGs) covers a simple path 
p{v\,Vr) = ui, ..., Vr if there exists a sequence of positive integers ji, such 

that 1 < ji < ... < jr-i < s and for every i, 1 < i < r — 1, (vi, Vi+\) belongs to 
DGj- . We see that a path p need not be covered by ^ in a unique way. There 
could be different sequences ki, ..., such that {vi, Ui+i) belongs to DGk^ ■ We 
assume that the greedy deadlock-free controller based on Q works with minimal 
(r — l)-tuples (fci, ..., kr-i) (minimal w.r.t. the lexicographical ordering). 

Theorem 9. 1 ^ There exists a deadlock-free greedy controller of size 2 for the 
optimal packet routing on a d-dimensional hypercube and of size 4 for the optimal 
packet routing on a d-dimensional torus. 

Due to Theorem 0 the size of deadlock- free greedy controller for the optimal 
packet routing on BFd is at most 4. An interesting question is to determine the 
size of greedy controllers for other interconnection networks. 
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4 Communication Schemes 

In this section we present communication schemes for efficient implementation 
of communication problems. The main emphasis is on the implementation of 
path systems satisfying given communication patterns, which is efficient w.r.t. 
the space, dilation and buffer size. 



4.1 Interval Routing Schemes 

An Interval Labeling Scheme (ILS) is a scheme of labeling each node in a graph 
G by a unique integer from the set {1,2, ...,n} and each arc by an interval 
[a, &], where a, 6 S {l,2,...,n}. We allow cyclic intervals [o,6] so that [a, &] = 
{a, a + 1, ..., n, 1, ..., b} for a > b. The set of all intervals associated with the arcs 
incident to a node must form a partition of the set (1,2, ...,n}. Messages to a 
destination node having a label I are routed via the arc labeled by the interval 
[a, b] such that I G [a, b]. An ILS is valid if the path system specified by this ILS 
satisfies the all-to-all communication pattern. (Thus, if, for all nodes u and v 
in G, messages sent from u to v reach v correctly, not necessarily via shortest 
paths.) A valid ILS is also called an Interval Routing Scheme (IRS). An IRS thus 
specifies for each pair of distinct nodes u and u in G a (unique) path from u to 

V. 

In a fc-ILS each arc is labeled with up to k intervals, always under the as- 
sumption that at every node, all intervals associated with arcs going out from the 
node form a partition of {1, ..., n}. At any given node a message with destination 
node labeled I is routed via the arc labeled by the interval containing 1. If A:-ILS 
does not use cyclic intervals, the fc-ILS is called linear or simply fc-LILS. Valid 
fc-ILS and /c-LILS are called fc-IRS and fc-LIRS respectively. A /c-IRS (fc-LIRS) 
is said to be optimal if it represents a shortest path system containing exactly 
one shortest path between any pair of nodes. 



4.1.1 Compactness. To measure the space efficiency of a given IRS, we use 
the compactness measure, defined as follows. The compactness of a graph G, 
denoted as compactness{G), is the smallest integer k such that G supports a 
fc-IRS of all-to-all single shortest paths, that is, a fc-IRS that provides only one 
shortest path between any pair of nodes. 



4. 1.1.1 All-to-all Single Shortest Paths Schemes. Matching upper and 
lower bounds on the compactness of general graphs have been presented in |2S| . 

Theorem 10. m 

— Every n-node graph G, n> 1, satisfies 

compactness{G) < ^ + 0.25-\/2n?n (3n^) 
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— For every sujficiently large integer n, there exists an n-node graph G such 



that 



compactness{G) > 



n 

4 



1.72(n^ln 



A powerful technique for obtaining lower bounds on the compactness of shor- 
test path interval routing on arbitrary graphs has been introduced in |22| and 
also used in ESElSa- 

The compactness of many graph classes has been studied. Its value is 1 for 
trees m, outerplanar graphs hypercubes and meshes EEHI, 8-directional 
meshes m, r-partite graphs m, interval graphs iDI, and unit-circular graphs 
m- It is 2 for tori |SS!, at most 3 for 2-trees |3D|, and at most 2^/n for n-node 
chordal rings m- More results on the compactness of concrete graphs can be 



liieKimim 



It has been proved that compactness 0(n) might be required on n-node 
random graphs m- However, there are also certain well-known interconnection 
networks with large compactness, including shuffle exchange SE^, cube connec- 
ted cycles CGGd, butterflies BF^ and star graphs Sd- 



Theorem 11. m 

— compactness{S Ed) = G(n^l‘^~'^), for every e > 0 

— compactness{GGCd) = ^2{^Jn/\ogn) 

— compactness{BEd) = l7(\/n/logn) 

— compactness{Sd) = I2(n(loglogn/logn)^) 



Following techniques from inni, we can prove the lower bound on the com- 
pactness also for De Bruijn graphs DBd- 

Theorem 12. compactness{DBd) = l7(\/n/logn) 



The question is whether above stated lower bounds on the compactness for 
special interconnection networks can be improved. 



4.1.2 Compactness Versus Dilation. We now consider the compactness for 
dilation hounded IRS. 



Special Networks. Asymptotically optimal trade-offs between the dilation 
and the compactness have been obtained for some special classes of graphs. The 
compactness threshold &{y/n) for the dilation 1.251?— 1 has been proved on mul- 
tiglobe graphs and the same threshold 0{y/n) for the dilation D on planar mul- 
tiglobe graphs (called globe graphs). Moreover, for globe graphs nearly-optimal 
routing (in the sense of (1 -I- e)Z?-bounded routing for any given constant e > 0) 
is achievable with only constant compactness. 

The multiglobe graph (denoted as M(s,t,r)) is obtained from the complete 
bipartite graph Ks^ by replacing all edges by unique path of the length r. Hence, 
Ksd = M{t, s, 1). Its diameter is 2r, it has (r — l)sf-|-s-|-f vertices and rst edges. 
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Theorem 13. psi 

— There is a multiglobe graph M(s,t,r) such that each k-IRS of M with the 
dilation hounded by 1.2bD — 1 needs k = f2(^/n). 

— There is a 1.25D-bounded 2-IRS of the multiglohe graph M{s,t,r). 

— For any e > 0 there exists a k-IRS of the M(s,t,r) with the dilation bounded 
by (1 + e)D and k = \ ^e] • min{s, t). 

The globe graph (denoted as G(r, s), r odd) is a planar multiglobe graph 

M(s,2,r§l). 

Theorem 14. m 

— Every optimal IRS of the globe graph G{s,s+ 1) needs compactness s/4. 

— There is an optimal IRS ofG{r,s) with compactness min(s,r). 

— There is a 1.5D-bounded 1-IRS ofG{r,s). 

— For arbitrary e > 0 there is a {l + e)D -hounded IRS of G(r, s) with constant 
compactness. 

It would be interesting to achieve asymptotically optimal compactness-dila- 
tion trade-offs for other classes of interconnection networks (having large com- 
pactness requirements) . 

Two interesting open problems related to the generalization of Theorem HI 
towards planar graphs are mentioned. The question m is whether for any e > 0 
there is a constant k such that every planar graph G of diameter D satisfies 

k — dilation(G) < (1 -I- e)D. 

C. Gavoille j2Sj posed the conjecture that every n-node planar graph has com- 
pactness 0{^/n). 



General Networks. Note that for every network there is an interval routing 
scheme with compactness 1 and of dilation 2D, where D is the diameter of the 
underlying network m- For the dilation bounded interval routing on general 
graphs, the following nontrivial upper bound result has been obtained with non- 
constructive proof. 

Theorem 15. m There is an interval routing scheme with the dilation 
and the compactness 0{^/nlogn) on n-node networks with the diameter D. 



A technique for lower bounds on the compactness of dilation bounded interval 
routing has been introduced in and improved in |2 1 l‘2til, 151.541, . For linear 
IRS see [Tnj . 

We summarize the best known lower bounds. For the compactness 1, the 
lower bound on the dilation in the form 2D — 3 was proved in m- They proved 
the optimality of 1-IRS of dilation 2D from m- For the compactness k, 2 < 
k < 0(y/n), the lower bound on the dilation in the form 311/2 — 3 appeared in 
m- For the compactness k, 2 < k < 12(n/logn), the following lower bound on 
the dilation was proved in mm- 
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Theorem 16. firm 

— For every D >2, there exists an n-node graph G such that 




for every k < 0{n/{Dlog{n/D))). 

— For every D > 41ogn there is a bounded degree n-node graph G of diameter 
D such that 



The question remains to determine a tight trade-off between the compactness 
and the (worst case) dilation for general networks |3EI • Another interesting issue 
is to exploit the relationship between the compactness and the average dilation 
(average stretch factor) for arbitrary networks |'2f)j . 

4.1.3 Compactness Versus Stretch Factor. The following routing algo- 
rithm, constructed in polynomial time, has been presented in m 

Theorem 17. m For every n-node graph G, with the diameter D, there exists 
an interval routing scheme on G such that 

— the compactness is at most 3-y/n(l -|- In n), 

— the worst case stretch factor is at most 5, 

— the average stretch factor is at most 3. 

4.1.4 Compactness Versus Congestion. The competitiveness factor expres- 
ses how well the fc-IRS behaves with respect to any other scheme on all input 
communication patterns. 

The natural question to ask is how much one loses using fc-IRS with respect 
to unrestricted routing paths systems. There exists an n-node graph G and a set 
of communication patterns such that any shortest path fc-IRS for G is at least 
n / 2-competitive . 

Moreover, there also exists an n-node graph and a communication pattern 
such that any optimal fc-IRS, k = 0(1), is 17 (n) -competitive with respect to non- 
optimal fc-IRS. And finally, there exists an n-node graph and a communication 
pattern such that any optimal -^n-IRS is l7(-y/n)-competitive with respect to 
non-optimal 0(1)-IRS. 

The main question remains whether there is an n-node graph and functions 
/i(n) << / 2 (n), gi{n) » g 2 {n) such that any /i(n)-IRS is at least g\{n)/g 2 {n)- 
competitive with respect to / 2 (n)-IRS. As a partial solution of this problem, for 
each fixed k = there exists an n-node graph and a communication pat- 

tern such that the congestion of each path system induced by fc-IRS is greater 




D 



n 



for every k < 0.05 



Dloginj D') 



n 
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than the congestion of the best path system induced by {k + 1)-IRS, both satis- 
fying a given communication pattern. 

For specific topologies, the next two propositions give basic results on com- 
petitiveness for one-to-all and all-to-all communication patterns. We see that 
matching or nearly matching upper and lower bounds on the competitive ratio 
hold for many interconnection networks. 

Theorem 18. 0 

— There exists a 1-competitive 1-IRS on chains and trees for arbitrary commu- 
nication patterns and on rings for dynamic one-to-all patterns. 

— There exists a 1-competitive 1-IRS on 2-dimensional grids and tori for static 
one-to-all patterns. 

— There exists a {l-\- -competitive 1-IRS on 2-dimensional tori for dynamic 
one-to-all patterns. 

Theorem 19. 0 

— There exists a 1-competitive 1-IRS on any ring for all-to-all patterns. 

— There exists a (1 -I- o{l))- competitive 1-IRS on any d-dimensional grid and 
a (1 -|- o{l))- competitive 2-IRS on any d-dimensional torus for all-to-all pat- 
terns. 

— There exists a (1.2 -|- o{l))- competitive 1-IRS on any 2-dimensional tori for 
all-to-all patterns. 

The following result on tori relates congestion to stretch factor, and is tight 
as the lower bound on arc-congestion for is 0.125n^. 

Theorem 20. The arc-congestion of any all-to-all path system induced by 
1-IRS on 2-dimensional tori is at most 0.15n^ -I- o(n^), with stretch factor 
at most 2.2. 

4.1.5 Compactness Versus Buffers. A (fc, s)-DFIRS (deadlock-free IRS) for 
a graph G is a fc-IRS for G together with a deadlock- free controller of size s for G 
which covers the all-to-all single shortest path system induced by the fc-IRS. As 
all controllers in DFIRS are based on the concept of acyclic orientation covering, 
the orientations of edges can be saved at nodes of degree S with additional 0{6) 
bits. 

We give upper bounds on the trade-offs between the compactness and the 
size of deadlock- free controllers for certain well-known interconnection networks. 
The next results for hypercubes and tori are from m- 

Theorem 21. m 

— For every i {1 < i < d) there exists a (2*“^, \d/i'] -\- 1)-DFLIRS for a 
d-dimensional hypercube. 

— For every n and i {1 < i < d) there exists a ([n*/2],2 • \d/i'\ -\- 1)-DFLIRS 
on a d-dimensional torus. 
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Note that when we consider linear interval routing schemes on d-dimensional 
hypercubes, the size d + 1 can be obtained with compactness 1, and the reduc- 
tion to size 2 can be achieved with the compactness . G. Tel m posed 
the question whether it is possible to obtain the shortest path system induced 
by a (linear) interval routing scheme, which uses only two buffers per node for 
deadlock-free packet routing. We argue that there is no deadlock-free linear inter- 
val routing scheme (based on acyclic orientations) on a d-dimensional hypercube 
of the compactness 1 and size 2. 

When we consider linear interval routing schemes on d-dimensional tori, the 
size 2d-|-l can be obtained with the compactness 2, and the restriction to the size 
5 can be achieved with the compactness 0(n‘^“^). As there exists a deadlock- 
free controller of size 4 for the optimal packet routing on a d-dimensional tori, 
it remains an open question the existence of a better deadlock-free IRS. 

4.2 Multi— dimensional Interval Routing Schemes 

Multi- dimensional interval routing schemes (MIRS for short) are an extension 
of interval routing schemes. In (k,d)-MIRS every node is labeled by a unique 
d-tuple (Zi, ..., Id), where each h is from the set {1, (1 < < n). Each arc 

is labeled by up to k d-tuples of cyclic intervals Id,i), ■■■, ■■■, Id,k)- 

In any node a message with destination (Zi, ...,ld) is routed along any outgoing 
arc containing a d-tuple of cyclic intervals (/i, ...,/d) such that U € h for all i. 
In this case, multiple paths are represented by the scheme, so the intervals on 
the arcs of a given node may overlap, i.e. they do not form a partition of the 
nodes in V. 

As noted, MIRS can be multipath. A routing based on a multipath routing 
scheme must choose one arc from the suggested one. If a scheme represents all 
shortest paths it is called full-information shortest path routing scheme. 



4.2.1 Compactness. The upper and lower bounds in Theorem 111! apply also 
to multi-dimensional interval routing. 



4.2. 1.1 All-to-all All Shortest Paths Multi-dimensional Schemes The 

first study of space complexity of multi-dimensional schemes appeared in nni. 

Theorem 22. jEj 

— For trees, rings and complete graphs there exist full-inf ormation shortest path 
(1,1) -MIRS. 

— For complete bipartite graphs there exist full-information shortest path 
{2,l)-MIRS. 

— For every n and Z, 1 < Z < d, there exists a full-information shortest path 
(n*“^, \d/i~\)-MIRS on a d-dimensional torus. 

— For each i, 1 < i < d, there exists a full-information shortest path 
([2*“^/Z], \d/i~\)-MIRS for a d-dimensional hypercube. 
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Further study has been presented in Ell- 

Theorem 23. 

— For a d-dimensional butterfly there exists a full-information shortest path 
{2,3)-MIRS. 

— For a d-dimensional CCC there exists a full-information shortest path 
{2df,d)-MIRS. 

The question remains to determine the parameters of MIRS for star graphs. 

DIS and CONS Models. A (A:, d)-MIRS is denoted in pS) as {k, d)-DIS-MIRS. 
They also introduced slightly modified CON-MIRS model. In (fc, d)-CON-MIRS 
every node is labeled by a unique d-tuple {li, ...,lf), where each f is from the 
set {1, {rii < n). Each arc is labeled by d-tuple of up to k cyclic intervals 

({-^ 1,15 ■•■5 •••; ) ■ 

In any node, a message with destination (Zi, ...,ld) is routed along any outgoing 
arc such that for all i the label li is contained in the union of intervals in the 
i-th dimension. 

If a graph G has a (fc, d)-CON-MIRS, then it has a (I, fc • d)-MIRS with the 
same memory requirements per node and the same routing paths. The converse 
does not hold (as an example we can take full-information shortest path routing 
on cube connected cycles). 

It is known that shortest path routing imposes high memory requirements 
for any routing scheme. There exist graphs, for which each (fc, d)-CON-MIRS 
and (fc, d)-DIS-MIRS requires fc • d = l7(n/logn). 

In P3] it was shown that the DIS-MIRS model is asymptotically stronger 
than the CON-MIRS model when considering memory requirements of the full- 
information shortest path routing schemes of cube connected cycles. 

Theorem 24. f44) For a full-information (k,d) -CON-MIRS of CC Cm graph 
the following bound holds on fc and d: 



We recall the result of Theorem 1^ that for CCCm, there exists a {2m^,m)- 
DIS-MIRS with the length of labels -\- mlogm -|- 0{m) bits and memory 
required per node 0{nnf) bits. 

Even better lower bound has been proved for Cayley graphs. 

Theorem 25. P! For a full-information {k,d)-CON-MIRS of Sm gi"aph, the 
following bound holds on fc and d: 

k-d= C(2”^/3) 

The main problem remains to develop effective lower bound technique on fc • d 
also for DIS-MIRS model. 
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4.2.2 Compactness Versus Congestion. There are just few results about 
multipath MIRS with (asymptotically) optimal congestion on special networks. 

Theorem 26. P! There exists a multipath (2,d + 2)-MIRS on CCCd with 
asymptotically optimal congestion (1 + • tt, where tt is the forwarding 

index of CCCd- 

We give a trade-off between the congestion and the compactness of multipath 
MIRS for general graphs. 

Theorem 27. PH For any graph G of maximum degree bounded by A with 
forwarding index tt and a given s, 1 < s < n, there exists a multipath 
(2 -I- [n/2s] , 1)-MIRS with congestion tt -I- nAs. 

As a consequence, for any planar graph of constant bounded degree there 
exists a multipath (O(-yn), 1)-MIRS with asymptotically optimal congestion. 

The previous result is based on the fact that the schemes are multipaths. 
A natural question arises whether a similar result is possible for deterministic 
routing schemes too. A positive answer to this question is given below, using the 
probabilistic method. 

Theorem 28. f44) Let G be any connected graph of maximum degree A with 
forwarding index tt. For any s such that 1 < s < n, there exists a {2 + n/2s)-IRS 
with congestion a • tt + nAs, where a satisfies 




This theorem has an interesting consequence for planar graphs of bounded 
degree. For any planar graph, with degree bounded by a constant and with 
forwarding index tt, there exists a 0(-\/n log n)-IRS with asymptotically optimal 
congestion 0{tt). 



4.2.4 Compactness Versus Buffers. Efficient deadlock-free MIRS (DFMIRS 
for short) with respect to the compactness and the size (of buffers) on hypercu- 
bes, tori, butterflies and cube connected cycles have been presented in PH- 

Theorem 29. PH 

— For every i {1 < i < d) there exists a ((2®“^, \d/i^),2)-DFMIRS for a 
d-dimensional hypercube. (For z = 1, we obtain a {(l,d),2)-DFMIRS.) 

— For every n and i {1 < i < d) there exists a ((rz®“^, \d/i~\),4:)-DFMIRS on a 
d-dimensional tori. (Fori = 1, we obtain {{l,d),A)-DFMIRS.) 

— There is a {(2,S),4:)-DFMIRS on a d-dimensional butterfly. 

— There is a {{2d^,d),2d + 6)-DFMIRS on a d-dimensional cube connected 
cycles. 
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The above results can be transformed also to an analogous wormhole rou- 
ting model (as presented in CHI). These results give an evidence that for some 
well-known interconnection networks there are efficient deadlock-free multidi- 
mensional interval routing schemes despite of provable nonexistence of efficient 
deterministic (i.e. all-to-all single shortest-path) IRS (see lower bounds in Theo- 
rem inj. The main question remains whether there are efficient deadlock-free 
MIRS also for wider classes of graphs, e.g. vertex symmetric graphs, planar gra- 
phs, etc. 



5 Conclusions 

We have presented a survey of recent developments in the complexity of oblivious 
compact communication schemes in point-to-point networks. Further study of 
combinatorial properties of path systems can be helpful in the design of efficient 
communication schemes w.r.t. various efficiency parameters. 

Unfortunately, we had to leave out several important lines of research in this 
area. One of these is the area of universal communication schemes and adaptive 
communication schemes in point- to-point networks. 



Acknowledgements. I would like to thank Richard B. Tan and Daniel Stefan- 
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Abstract. The purpose of auditing an information system is to assess, among 
others for the organisations’ management, that the system functions in the way 
it was intended. Because of the speed of developments in technology and the 
increasing complexity of infrastructures and information systems, auditing 
information systems is becoming more and more difficult. Knowledge of many 
aspects of information technology is required in order to give an opinion on the 
quality of information systems. Since it is nearly impossible to combine all this 
expertise in one person, co-operation between several disciplines is necessary. 
This paper will give an introduction to the different aspects of it-auditing in 
general and will demonstrate the difficulties that it-auditors face when, for 
example, auditing an electronic commerce system. It will indicate the need for 
co-operation and it will be concluded by suggesting solutions for the auditors’ 
problems. 



1 Introduction 



1.1 IT- Auditing 

IT-auditing (also referred to as ict-auditing, edp-auditing or information systems 
auditing) is a relatively new profession. That is one of the reasons why many 
definitions for IT-auditing exist, which is not uncommon also in other areas of 
information technology. One of these definitions is: 

“An IT-audit is an independent and impartial assessment of the reliability, security 
(including privacy), effectiveness and efficiency of antomated information systems, 
the organisation of the automation department and the technical and organisational 
infrastructnre of the automated information processing. This activity applies to both 
operational systems and systems under development.” 



* This paper is written on a personal basis and in no way represents the opinion of De 
Nederlandsche Bank NV. 
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There are three sets of keywords in this definition: 

• Independent and impartial. This means that the person(s) performing the audit 
must not have a hierarchical relationship with the auditees or in any other way 
depend on them (e.g. financially). 

• Reliability, security, ejfectiveness and ejficiency. These are the so-called quality 
aspects. In the audit and security literature many different sets of terms can be 
found. Usually they all cover the same aspects only in a different decomposition. 

• Information systems, automation department and infrastructure. These are the 
three possible objects of audits. The infrastructure is the technical as well as the 
organisational infrastructure. 



1.2 Quality Aspects 

As said in the previous paragraph, many sets of terms exist for the quality aspects of 
information technology. For example, according to the Dutch Association of 
Registered EDP-Auditors (NOREA) an IT-auditor assesses and advises on the 
following aspects of information technology: effectiveness; efficiency; exclusiveness; 
integrity; auditability; continuity; controllability. 

A more wellknown set of terms is CIA (confidentiality, integrity and availability), 
often completed with auditibality. It is important that an organisation defines its’ set of 
terms to make sure that everybody has a good understanding of what is covered by 
each term. In the aim and scope of each audit has to be described what quality aspects 
will be taken into consideration in that particular audit, e.g. only efficiency or a subset 
of the security terms. 



1.3 Objects of Audit / Types of Audit 

The three objects of audit (information systems, automation department and 
infrastructure) can be refined. Depending on the object of the audit a different type of 
audit will be performed with different skills / expertise required from the auditor. The 
following types of audit can be distinguished: 

• Computer centre (or data processing centre) audit, performed by the computer 
centre auditor; 

• Technical (hardware, systemsoftware, middleware, datacommunication 
components) audit, performed by the technical auditor; 

• System development audit and audit of selection of software packages, performed 
by the information system auditor; 

• Pre- and post-implementation application system audits also performed by the 
information system auditor. 
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1.4 Information Processing Environment 

The relationship between the audit objects can be illustrated by a figure [10] that has 
been derived from a figure used in a publication by the Royal Dutch Institute of 
Chartered Accountants [6]. The figure originates from a centralised (mainframe) 
information processing environment, but by referring to functions rather than to 
departments, the figure is also applicable to decentralised or distributed information 
processing environments. 



Information processing environment 



IT function 



System development 
function 



Electronic data processing 
function 



User function 



Administrative org. 



Administrative org. 
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Fig. 1. Information processing environment 
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1.5 Types of Controls 

Generally, the measures to meet the security requirements that are set by an 
organisation, are divided in three types: 

• general controls; 

• application controls; 

• user controls. 

General controls are measures that work for all (or at least for most) applications and 
they can be found in the automation department (both the development function and 
the data processing centre) and in the infrastructure. These are organisational controls 
as well as technical controls in hardware and system software (including middleware, 
tools and datacommunication components). 

Application controls are specific programmed controls in one application system. 
Usually they are specific for the business processes that are supported by the 
application. 

User controls normally also are specific for one application (or a set of applications 
that support one business process). These are controls in the administrative 
organisation and the internal control structure of the user department. 



2. Difficulties for IT- Auditors 



2.1 Professional Requirements 

An IT-auditor is an independent expert in the field of information technology and 
control theory. In order to meet standards set by professional audit organisation^, he 
must have knowledge of: 

• Internal control theories and concepts; 

• Security principles and measures; 

• Information technology (hardware, software, infrastructure); 

• Financial/economical aspects of the control processes within an organisation 
(especially planning, budgeting, decisiontaking, etc.); 

• Management and organisation theories; 

• Administrative organisation and its models and systems; 

• Methods and techniques that are available for performing effective it-audits; 

• Generally accepted auditing standards and controlmethods and techniques; 

• Knowledge of business processes. 



^ e.g. the Dutch Association of Registered EDP- Auditors [7] 
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Because developments in all these areas go very fast, it is difficult for an auditor to 
keep up in his own areas of expertise and it is practically impossible to keep pace with 
the developments in all areas. 



2.2 Trends in Information Technology and Control Theories 

Two of the most important areas of knowledge of an IT-auditor have shown major 
developments during recent years. These developments / trends require new 
knowledge, new approaches and new ways of thinking of an auditor. Everybody 
knows and can follow in many media the developments in information technology: 
internet, www, electronic commerce, java, SET, client/server applications, object 
oriented programming, data mining, integrated system management tools, 
datacommunication, new chiptechnologies, etc. 

The second area where major changes can be seen is in control theories and auditing. 
International developments concerning internal control and corporate governance 
indicate that management has to become more aware and has to take more 
responsibility for the control of an organisation and the reporting on the internal 
control system. The trend is towards controlling processes rather than controlling 
products / components. 



2.3 Need for Co-operation (1) 

Looking at the enumerations in paragraph 2.1 and 2.2 it is obvious that, even within 
one organisation, too many aspects have to be dealt with in order for one IT-auditor to 
have all the necessary knowledge. Therefore, depending on the object of the audit, co- 
operation is necessary with: 

• other IT-auditors; 

• financial auditors (chartered accountants); 

• IT-specialists; 

• security specialists; 

• software developers / engineers; 

• etc. 
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3 Example: Electronic Commerce 

As an example to illustrate the difficulties for an it-auditor, the following figure shows 
the many parties and components that can he involved in a cross-border electronic 
commerce trade transaction. This concerns a real business to business transaction and 
not e.g. the purchase of a book at Amazone by an individual. The transaction starts in 
Organisation X with an electronic order that is initiated in one of the business 
applications. This order is then translated into an electronic order message that is sent 
to the trading partner via the internal and external network. The trading partner 
delivers the goods and sends an electronic invoice to the organisation X. Organisation 
X pays the invoice by means of electronic banking. To keep this example simple, we 
will not take into consideration the possible involvement of a trusted third party / 
certification authority or a third party service provider. 



Organization X 



a. Organization structure 



b. Business applications 



c. EDI software 



d. Communications 
interface, firewall, etc. 



f. Bank 



g. Trusted third party / 
certification authority 



h. Third party service 
provider 



e. External network 



border 



i. Trading partner 



Fig. 2. Parties and components involved in an electronic commerce trade transaction 



We are going to look at this transaction from the viewpoint of the internal IT-auditor 
of Organisation X. What are the consequences for the auditor when management asks 
the simple question: Is the trade transaction and the processing of it reliable? 
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The auditor will have to decompose the term reliable in other quality aspects, e.g. 
authorised, correct and timely. This makes it easier to relate the quality aspects to 
threats and in that way to necessary measures. In the whole process of the trade 
transaction a number of threats exist that can occur in various parties or components. 
In the following list of threats, for each threat a reflection will be given on possible 
causes and on consequences for the auditor. These reflections do not pretend to be 
exhaustive and are only intended to illustrate the many different aspects an auditor 
encounters. 

Unauthorised creation or change of an order(message), invoice(message), 
paymenti message ) 

If the organisation structure does not have an adequate segregation of duties 
(formalised and approved), the unauthorised creation or change of an order is a real 
risk. This does not only effect the user departments but also the automation 
department. A wellknown segregation is the one between application development and 
the data processing centre. This segregation must guarantee that no unauthorised 
changes in business applications become operational. The business applications must 
incorporate the segregation of duties (of the userdepartments) in their structure. 
Particular problems for an auditor are the creation of messages outside the own 
organisation and the creation by the trading partner. Digital signatures can be of help 
to check whether a message was sent by an authorised trading partner. This is however 
a check on the authorisation of the trading partner as an organisation, the check 
whether the creation was done in an authorised way within the trading partners’ 
organisation is the responsibility of that trading partner. To obtain information / 
assurance on this, the auditor must cooperate with the (internal or external) auditor of 
the trading partner. 

Unauthorised disclosure of order(message), invoice(message), payment(message) 
Within an organisation, access to information must be granted on a need to know 
basis. Protection of the information against unauthorised disclosure can be provided 
by: 

• the authorizationmatrix in the organisation structure; 

• access control measures in the general controls offered by the data processing 
centre; 

• access control measures in the business applications; 

• encryption of messages. 

Of particular interest for the auditor are the measures that are taken within the trading 
partners’ organisation. He has no influence on them nor the authority to assess their 
quality himself. Therefore he has to cooperate with the internal or external auditor of 
the trading partner. 

Unauthorised deletion of order(message), invoice(message), payment(message) 

On several occasions when records and messages are stored (e.g. after processing by a 
business application or after message translation), there is a risk of unauthorised 
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deletion, either intentionally (e.g. to conceal fraudulent transactions) or accidentally 
(e.g. because of wrong parameter settings concerning retention periods). 

The auditor has to be sure that not only appropriate measures exist to prevent the 
unauthorised deletion, but also measures to detect and restore such a deletion. 

Unauthorised processing of a message 

The receipt of authorised messages, meaning that they are sent by legitimate, 
acknowledged trading partners, does not mean that such a message can be processed 
without further controls. The message must not have been received before, all 
necessary data elements must be correct, etc. This is the area of edi software (message 
translation) and application controls in the business applications. The auditor has to 
check that the necessary controls are built in and that they function correctly. 

Denial of sending a message 

Both Organisation X and the trading partner must have means to proof that the other 
party did send a message. If for example the trading partner claims to have received an 
ordermessage and Organisation X denies to have sent it, technical measures must 
provide proof. Procedural measures, settled in a trade agreement, must exist for 
settling a dispute. Also retention periods play a role. 

Denial of receiving a message 

Denial of receiving a message can be caused by the wish to delay a process (like 
having to pay the invoice later) or by technical reasons (the message was really not 
received because it was sent to the wrong address). The auditor has to pay attention to 
the existence of an acknowledgement mechanism and to the correct address 
translation. 

Claim of receiving a message that was not sent 

The trading partner for instance can claim he has received an ordermessage that 
Organisation X says not to have sent. By means of digital signatures can be shown 
who is right. The auditor must assess the correct set-up and functioning of the digital 
signature scheme. 

Delay in delivery of messages / Denial of service (availability network, other 
components) 

Malfunctioning of hardware or software components can cause delay in delivery or 
denial of service. The auditor has to assure himself that for all components, if 
necessary, back up and fallback measures exist. 

Non-compliance with legal/fiscal requirements 

In a situation (country) where electronic trade documents have a legal status and 
original paper documents are no longer required, the auditor must check whether the 
business applications meet legal and/or fiscal requirements (for example concerning 
record retention periods or accessibility of records). 
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In summary, before he can give an answer to the management’s’ question and thus 
giving an opinion on the reliability, the auditor has to assess (or rely upon): 

• the user controls in the organisation structure; 

• the general controls of the automation department; 

• the application controls in the business application; 

• the user controls around the business application; 

• the general controls in the datacommunication infrastructure (including the 
electronic commerce software, interfaces, firewalls, etc.); 

• the general controls concerning the external network; 

• controls concerning the trading partner. 



3.2 Additional Requirements for the Auditor 

Besides the knowledge mentioned in paragraph 2.1 in this example the auditor also 
must have knowledge of: 

• (inter)national legislation on e.g. legal status of electronic trade documents, record 
retention, data interchange agreements, etc.; 

• (inter)national legislation on cryptography, digital signatures, etc.; 

• specific IT / technical aspects: 

• strength of cryptographic algorithms used; 

• firewalls; 

• edi software / edi standards; 

• evaluation / certification schemes in the country of the trading partner; 

• trusted third parties / certification authorities. 



3.3 Need for Co-operation (2) 

In addition to paragraph 2.3, there is also a need for co-operation with: 

• legal specialists 

• cryptographers 

• auditors of trading partner 

• standards bodies 

• edi organisations. 



4 Possible Solutions 

If an IT-auditor would have to seek co-operation with all the parties mentioned in 
paragraph 2.3 and 3.3 for each audit assignment, audits would take too long and 
become too expensive. In order to decrease the need for co-operation, a number of 
activities are possible. These activities have to be undertaken by both individuals and 
(professional) organisations. 
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It is emphasised that the need for co-operation is not only a necessity from the 

viewpoint of the auditor, but just as well it is a necessity for hardware/software 

developers/engineers to understand audit and control concepts and requirements. 

Activities: 

• Stimulate the setting up of and participating in working groups, task forces, etc. 
These working groups will have to study and report on specific issues or subjects. 
The results of such studies must be made available to all professionals. This must 
be done both on a national and an international level. Parties that have to play a 
role in this respect are professional organisations of engineers, auditors, etc. and 
computer societies. [] 

• Make research results (both universities and industry) broader and easier 
accessible, make them comprehensible for auditors. 

• Make control and audit theories and techniques better-known to researchers and 
engineers. For example by including this subject in regular education and training 
programmes. 

• Stimulate the use of independent security evaluations. It would be helpful if more 
technical components, products and systems would be available that are evaluated 
against generally accepted evaluation criteria (like the harmonised Common 
Criteria). An auditor would know what to expect from such a product and can 
make use of that knowledge for his overall judgement without having to perform 
the evaluation (audit) again. 

• Stimulate and participate in the development of standards, both technical as well as 
procedural (guidelines). Stimulate the use of standards. 

• Stimulate the development and use of benchmarks. 
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Abstract. Traditional business processes are strongly impacted by the 
use of emerging information technology that provides new perspectives 
for strategy and relationship management between the main actors of 
electronic commerce. The fast-evolving World Wide Web plaftorm em- 
phasizes the need for new paradigms to deal with complex information 
systems that meet the requirements of innovative economic models. 
Integrated secure access to heterogenous information, knowledge publis- 
hing, and personnalized content delivery are important issues to be ad- 
dressed. The paper concentrates on the advantages and limits brought by 
a document technology approach to deal with new levels of interaction 
and control in business information and communication systems. 



1 Introduction 

The rapid growth of the global information infrastructure, essentially organized 
around the fast evolving Internet and Word Wide Web environments, clearly 
alters the traditional social, political and economical aspects of the society. Con- 
ducting business over the Internet is inconstestably a domain going through 
major developments, drastically changing relations between the various actors 
of electronic commerce. In order to cope with new strategic objectives allowing 
small and medium enterprises as well as larger organisations to compete in a 
global market, they need to rely on innovative business models deeply affecting 
their existing traditional organizational structures and processes. 

A major aspect to be addressed is the design and implementation of effective 
communication and information systems answering the needs of new business 
processes^. The user-friendly and platform independant access to informa- 
tion, through World Wide Web browsers, emphasizes the necessity to provide 
mechanisms that allow integrated access and manipulation of heterogenous data 
shared between complex pieces of distributed software. During the last decade, 
developping applications over the Internet, that address those interoperability 
issues, has become of major concern for many communities of researchers such 
as those involved in design of open hypertexts |H1, development of CORE A tech- 
nology and integrated access to distributed databases, development of agent ba- 
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sed systems and, definition and use of markup languages (sgmlP, htmlUHI, 
XML [SI) to represent and exchange structured information. 

The paper will focuse on the potential benefits to be brougth by the use 
of a so-called structured document approach, based on tagged representation of 
information, to address the interoperability issues in the framework of electronic 
commerce applications. The first part presents and summarizes relevant pro- 
blems to be dealt with in the specific context of business processes, the second 
one is dedicated to the description of the underlying concepts and evolution 
of the document technology and, the third one discusses important key issues 
currently under investigation or to be addressed in future research works. 

2 Information Systems in Bnsiness Processes 

The current evolution of electronic commerce is indubitably promoting the de- 
velopment of an information-based economy emphasizing complex relations bet- 
ween business partners, consumers and administrations. In order to face increa- 
sing global competition and customer expectations, the companies are confronted 
with the redesign of business processes overcrossing their own organization and 
often jointly owned by the company and its customers or suppliers. The rede- 
finition of such new processes rely on the access and processing of distributed 
interrelated piece of information where the use of document technology may 
bring some contributions. 

Extensive use of documents - First, the wide variety of activities involved in 
electronic commerce already makes extensive use of numerous electronic docu- 
ments: products and services description, project description, spreadsheet do- 
cuments, mailing, contracts, financial, administrative or technical documents. 
Unfortunately, most of the documents rely on proprietary formats and exchange 
of information between different document processing systems remains a difficult 
problem. Standardization efforts (such as XML) opens attractive perpectives to 
deal with this problem. 

Agent based market systems - In order to face the lack of information and commu- 
nication structure of web-based electronic commerce applications, many works 
have been undertaken to develop open agent-based market systems. The para- 
digm of interacting agents relies on the use of sometimes complex messages to 
exchange knowledge between agents. 

Development of complex Web based workflow - Workflow management systems 
provide an automated framework for handling complex business processes in- 
side enterprises as well as interactions with economical partners. Many workflow 
management systems currently provide Web interfaces, unfortunately often li- 
mited to different proprietary workflow engines |n|. A combined use of XML and 
Java technologies could facilitate the implementation of transportable agent to 
enhance both platform independance and reusability beween processing appli- 
cations. 
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Electronic data interchange - Exchange of electronic information in business 
to business activities has been of great importance for a long time. The EDI 
(Electronic Data Interchange) standard has been elaborated to answer those 
needs. The possibility of extending traditional EDI capabilities by the integrated 
use of XML to define self-describing messages is currently under investigation to 
facilitate the implementation of a wide range of processing operations such as, 
validate user input or routing document for workflow applications. 

Internationalization - Another important issue relates to the multilingual fea- 
ture of data, an aspect of the problem brought to light by a global access to 
information provided by the World Wide Web environment. In this respect, 
document technology may also bring a valuable contribution. An appropriate 
representation of the linguistic structure may considerably facilitate the consi- 
stency checking of multilingual versions of a document P) P]. 

3 Structuring Electronic Eocuments: An Evolving 
Technology 

3.1 Representing Structured Documents 

Designing electronic document models for the production of high quality typo- 
graphical material has been the original aim of computer scientists involved in 
the structured document research area. Inspired by the traditional editorial and 
publishing processes, such models are based on a dual view of the document: the 
logical structure reflecting the author intentions, and the physical structure, close 
to the typographist perception of the document, allowing to specify the use of 
appropriate typographic resources for emphasizing the author’s intentions. 

In order to allow the generation of consistent documents and, thus, allo- 
wing enhanced processing operations, the concept of document classes has been 
introduced in order to adequately represent various types of documents. 

Standardization efforts have been undertaken to facilitate the representation 
and exchange of such structured documents between various processing applica- 
tions. SGML (Standard Generalized Markup Language), relying on an attribute 
grammar formalism to define document classes (DTD - Document Type Defini- 
tion) and tagged data to represent documents instances, has been and is still used 
by publishing organisms which have to deal with complex and highly structured 
documents. 

Gustomized SGML DTDs have been elaborated in order to agree on shared 
generic document models between specific communities of users or applications. 

— The international standard ISO 12083 has been elaborated by the Ameri- 
can Association of Publishers and the European Physical Society to provide 
standard methods for marking up scientific documents. 

— GALS (Gomputer-aided Acquisition and Logistic Support) has been spe- 
cified for the representation of technical documentation by the American 
Department of Defense. 
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— TEI (Text Encoding Initiative) is the result of a collaboration between se- 
veral research international organisations in order to provide formal guideli- 
nes for the representation of textual linguistics documents m- 

— HTML (Hypertext Markup Language) is a well know application of SGML 
for the representation of documents on the World Wide Web. 

The Text Encoding Initiative - The Text Encoding Initiative distinguishes from 
other DTDs design efforts on the measure that, it was the initial work aiming at 
providing guidelines for the modular conception of customized DTDs on the basis 
of an existing sets of logical components. The approach is based on the object 
oriented paragdigm and implemented by use of the SGML entities mechanism 
proposed in the SGML standard. 

The definition of pre-existing elements are organized in terms of classes; each 
new element possibly inheriting from a content model and associated attributes. 



3.2 Representing Hyperdocuments 

The concept of electronic structured documents, initially considered as an ab- 
stract representation of their paper counterform, has evolved towards a more 
complex way of representing semantically rich interrelated pieces of information, 
so-called hypertext documentsf^ ^3). Morevover, the nature of information 
contained in documents, especially the multimedia components such as sound 
and video, add a new temporal dimension to be dealt with in various applicati- 
ons. 

The ISO standard HyTime (Hypermedia Time/Based Structuring Langu- 
age) has been elaborated to encompass those important features of electronic 
documents! I l)j. HyTime is based on SGML; an SGML document may be con- 
sidered as a HyTime document by use of the so-called Architectural Form, a 
mechanism inspired from the object oriented paradigm. HyTime introduces new 
concepts in order to extend the document model. 

Roughly speaking, SGML provides a powerful mean of describing documents 
as a hierarchy of embedded logical components including cross-references in 
terms of uni-directional pointers. Additional logical non structural information 
may be defined by use of an attributes mechanism. 

In essence, the HyTime philosophy is comparable to SGML; it provides a way 
to describe the logical content of documents without providing mechanisms to 
specify the nature of the processing itself. Those aspects are dealt with in another 
standard, DSSSL - Document Style Semantics Specification Language jS]. The 
main contributions of HyTime in comparison with SGML are the following: 

— a better link model that allows to represent complex hyperdocuments 

— a way of representing multimedia data 

— a querying language (HyQ) to retrieve pieces of information into structured 
documents 
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The locator concept - In order to refer specific data objects in documents, Hy- 
Time provides the concept of a locator which allows to locate data according to 
three mechanisms: by naming (for example, an SGML identifier), by counting 
(for example, the ninth element of a list) and by querying (for example, the 
element whose attribute value is x). The following examples illustrate potential 
uses of such locators. 



Linking parts of documents - HyTime defines two kinds of links: the contextual 
links (similar to the references mechanism proposed by SGML) and independant 
links to be used in numerous purposes such as annotating documents, linking 
multilingual versions of documents (see figure representing complex hyper- 
links such as illustrated in figure 13 




<?EI.KMBNT transl-liiik -0 empty> 

<! ATTLiST transl-Hnk. 

HyTime Name #ilink 
id ID «EMPUED 

anchrolc CDATA #FTXED "French-Knglish" 
linkends IDREFS #REQUIRED 



<Lille ids |-f> DocumenlK structure </title> 
<tide ids t*e> Structured document!; </titlct> 
<transi-link linkends = "t-f t-e‘> 



Fig. 1. Use of independant links to represent multilingual documents 




Fig. 2. Representation of complex hyperlinks 



Representing a video sequence - To deal with multimedia components, HyTime 
proposes a model for space and time based on finite axes that each define an 
addressable range of quanta. Each quantum represents a discrete position along 
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the axis and has a coordinate which must be a positive number. The figure 0 
illustrates the definition of a multidimensional space allowing to address clipped 
regions of video sequences during a given period of time. 



Height 




<rcs)oclocsrcsm»viel impfcs = videu-fcs> 
<eWlia> 

<dimspec> 1 50</dimspec> 

<dim^)ec:> 50 </dimspec> 
dimspeo 76 25 c^dimspeo 
</ex(lisl> 
c/fcsloo 



Fig. 3. HyTime multidimensional space representation 



3.3 Documents as Pieces of Information in Global Information 
Systems 

The considerable development of World Wide Web and, more specifically, the 
HTML application of SGML to represent hyperdocuments over the Internet 
clearly promoted the use of document technology to handle structured data flow 
of information whitin distributed environments. In this sense, the emergence of 
XML (Extended Markup Language), a W3c recommendation opens attractive 
perspectives for many reasons. 

First, XML proposes a simplified syntax of SGML, abolishing particular fea- 
tures (such as the potential omission of end tag in documents instances, the 
potential abbreviation of tags, providing a specific syntax for empty elements, 
etc.) and, thus considerably facilitating the development of document oriented 
applications. 

Secondly, the combination of XML, XSL and Xlink, integrates the major 
results of research works in the area of structured documents and hypertext: 
allowing the separation between document content and any related processing 
operation as well as offering a powerful formalism to represent complex multi- 
media hyperdocuments. 

Finally, XML distinguishes between well-formed doeuments which are not 
necessarily associated with a DTD and, valid doeuments that conform to a ge- 
neric model. This is an important aspect to address the interoperability issues 
between applications, promoting the independance in regard with proprietary 
exchange formats of documents. 

As application of the emergent XML W3G recommendation, we may cite: 

— MathML (Mathematical Markup Language), a W3G Recommendation for 
describing mathematical notation, aiming at capturing both its structure 
and contents mi- 
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— SMIL (Synchronized Multimedia Integration Language) has been designed 
to specify hypermedia presentations taking into consideration spatial layout 
and timing relations between multimedia components as well as hyperlinking 
for time-based media |TH). 

The specification and evolution of markup languages (SGML, HyTime and 
XML) progressively integrated state of the art of research in document and 
hypertext representations. Figure 01gives a synthetic presentation of the relations 
between those standards and their application. 




HyTime 



ISO 12083 

TEI 

HTML 



XML-XSL-XLink MathML 

SMIL 



Fig. 4. Relations betweeen markup languages 



Representing a video sequence - To summarize the considerations about the evo- 
lution of document technology; it appears that the role of electronic documents 
in distributed information systems is gaining in importance. They have become 
active pieces of information whose content is not only provided by the user but 
altered by modifications of the environment (such as an update in a database); 
they act as user interfaces (such as WWW forms); they also may be considered 
as structured dataflow between various applications m 

As a consequence, the abstract model of a document is no longer limited 
to the editorial logical structure appropriate to print or for visualizing logical 
components, but has to rely on an adequate representation to be dealt with by 
several applications and users for multiple purposes. 

Let us illustrate this idea in the specific business processes domain. An impor- 
tant aspect of electronic commerce is related to the notion of electronic market 
mechanisms. As an example in this context, the traditional handling of requests 
for proposals may be improved by the use of a flexible structured representation 
of the information that takes into consideration the role to be played by various 
actors in the decision process. 

First, even if the specification of a request for proposal content depends on 
the domain it relates to, a generic framework for the specification may be helpful 
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in many respects. Relying on an agreed established structure provides a power- 
ful means to check the request validity: for example, in the computer science 
domain, checking consistency between administrative, technical and implemen- 
tation requirements may be of crucial interest. It helps the clients in formulating 
their expectations accordingly to a schema that facilitates acurate answers from 
the suppliers. It finally gives the possibility to automatize, to a certain extent, 
the process of going through the analysis of answers. 

Providing multiple customized partial views on the information is another 
important issue. An appropriate document model will allow the various actors 
in the clients organization to access relevant detailed or synthetized parts of 
the information; annotating his version of the documents or enhancing it with 
computed information from other sources of information, protecting access to 
confidential information such as the evaluation criteria. 



4 Some Key Issues in Document Technology 

Needs for modular representation of documents - The design of formal document 
descriptions, for instance in terms of SGML Document Type Definitions, often 
resulted in monolithic definitions of customized document classes, sometimes 
built in a modular way by use of the SGML entities mechanism. The Text 
Encoding Initiative addressed the problem of designing document models by 
taking the benefits of existing predefined document components and proposing 
explicit guidelines inspired by the object oriented paradigm. 

The problem of reusing logical document components is obviously a major 
issue to be addressed for facing the the problem of interoperability. It is similar to 
the problem encountered in software engineering and an object oriented design 
may be considered as a promising approach to deal with those aspects. 

In order to enhance the processing capabilities on documents, taking into 
consideration the structured description of micro structures is another impor- 
tant aspect. Identifying and representing acurately the structure of micro ele- 
ments, such as dates, URLs or ISBN numbers is part of the problem. HyTime, 
for instance, introduces the concepts of lexical types to deal with this specific 
components jS| . 

Applying multiple structures to documents - The logical organisation of docu- 
ments highly depends on the processing operations to be applied on them. The 
so called logical editorial structure aimed at describing an abstract representa- 
tion of the document for multiple rendering purposes, is still the predominant 
structure in most of the document processing systems. Sharing documents bet- 
ween numerous different applications emphasizes the need of applying several 
logical structures on a same data content. 

For instance, checking consistency between multilingual versions of docu- 
ments requires a knowledge of the linguistic structure of the information. There 
is no isomorphism between the editorial subdivision of sentences into paragraphs 
and the linguistic structure of a document. The SGML standard specifies the 
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possibility of applying concurrent structures on the documents by tagging docu- 
ments instances accordingly to several DTDs; this feature has been eliminated 
in the XML specification. 

Enhancing logical description of documents with attributes - The abstract struc- 
ture of documents is not only based on structural properties; the use of attribu- 
tes to provide additional logical information is another critical issue. The use of 
met a information, qualifying the documents content, is already used in multi- 
ple purposes. In a text processing system, specifying the language of documents 
components allows to apply appropriate language- dependant spelling checkers. 
In business applications, specifying appropriate rights access to portions of do- 
cuments appears to be an important aspect of workflow management. Adding 
meta information in order to improve querying methods is another important 
use of attributesjZj. 

Integrating databases and documents - In business applications, highly structu- 
red informations have been for a while confined in databases. Promoted by the 
World Wide Web technology, the use of structured data flow of information is 
considerably gaining in importance. The problem of storing, accessing and up- 
dating them in an efficient and reliable way becomes an important aspect to be 
dealt with. 

In this respect, databases technology offers mechanisms to provide secure and 
efficient access to information. Providing an appropriate representation of struc- 
tured flies into databases allowing functionalities such as concurrency control, 
querying and versioning is of major concern and needs to be investigated | 2 |. 
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Abstract. We show how the speed of a BSP (Bulk Synchronous Par- 
allel) computer depends on the parameters p, g, and I of the model. 
According to the values of parameters, BSP belongs among the first (se- 
quential) or the second (parallel) class models or neither of these classes. 
The relation between BSP and the class of weak parallel machines is also 
examined. It turns out that BSP does not fit to the concept of weak (or 
pipelined) parallelism. Consequences of membership in different machine 
classes to the physical feasibility of BSP computers are discussed. The 
main conclusion is that BSP with parameters properly chosen qualifies 
itself as a practical model, but it is unable to exploit all the parallelism 
allowed by laws of physics. 



1 Introduction 

One of the main aims of the research in the area of parallel computing is se- 
arching for a good model of parallel computers. Such a model is necessary for 
design and complexity analysis of parallel algorithms [B| • There is also need for 
a model which would allow development of a parallel complexity theory better 
connected to practical parallel computers then the classical theory based on the 
PRAM model [SCI- past, two main branches of models emerged. Massively 
parallel algorithms are usually designed for a simple but unrealistic PRAM mo- 
del. This model is also often used in the complexity theory of parallel algorithms. 
On the other hand, when developing parallel programs on really existing par- 
allel computers, programmers use models specific for a concrete architecture. 
Such programs are efficient but they cannot be easily ported to another type 
of computer. Researchers recognized the need of a bridging model of parallel 
computation. Such a model should serve as a common standard in the same 
way as von Neumann’s machine is used in the sequential computing. Valiant 
0 introduced the hulk- synchronous parallel (BSP) computer as a candidate for 
being a bridging model. In BSP, computation runs in supersteps on p proces- 
sors. There are only two other parameters I (communication latency and barrier 
synchronization time) and g (network bandwidth). The BSP model has been 
accepted by scientists and programmers, many papers on BSP algorithms were 

* This research was partly supported by the grant of the GA CR No. 201/98/0717, by 
the EU grant INCO-COOP 96-0195 ‘ALTEC-KIT’, and by the grant of the Ministry 
of Education of CR No. OK-304. 
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published, research groups were formed |4ISj and the model was implemented 
on a broad range of parallel computers. McColl 0 argues that BSP is a good 
model for general purpose parallel computing because it achieves scalable par- 
allel performance and provides a platform for architecture-independent parallel 
software. 

Van Emde Boas m defined the first and the second machine classes. The 
first class contains the deterministic Turing machine and all the other models 
which are polynomially time-equivalent and linearly space-equivalent to the Tu- 
ring Machine. This class is often understood as the class of sequential compu- 
ters. Machines belonging to the first class are physically feasible, i.e. they can 
be realized efficiently even when physical limitations (like the speed of light) are 
taken into account. On the other hand, machines for which time is polynomially 
equivalent to space of the Turing Machine form the second class. Its members 
are various kinds of massively parallel machines (PRAM) or they are not de- 
terministic (alternating Turing machine). The disadvantage of the second class 
machines is their physical infeasibility. Wiedermann introduced the class of 
weak parallel machines represented by the pipelined parallel Turing machine. 
Weak parallel machines are physically feasible, slower than members of the se- 
cond class, but faster than the computers from the first class. Their period (time 
between starting processing two inputs in a sequence of inputs) is polynomially 
equivalent to space used on a Turing Machine. 

In this paper, we study how the BSP model relates to the above mentioned 
three machine classes. We show that its computational power depends on values 
of the BSP parameters p, g, and 1. When adding more processors, communication 
and synchronization become more complicated. So we assume g and I to be 
functions of the number of processors p. Section El gives definitions of BSP and 
the machine classes and some technical lemmas. Membership of BSP in the 
first and the second machine class is analyzed in Sect. El A BSP computer 
belongs to the first class for g{p),l{p) = I7(p“) and to the second class for 
g{p),l{p) — O(log^p), where a and b are arbitrary positive constants. With an 
additional assumption about the relations between complexity classes we can set 
the parameters so that the resulting computer belongs neither to the first nor 
to the second class, as is proved in Sect. 0 It does not even belong to the class 
of weak parallel machines. Sectional shows that BSP is either too slow (member 
of the first class) or unrealistically fast (member of the second class) to exploit 
the potential of physically feasible parallelism represented by the class of weak 
parallel machines. The concluding Sect. 0 discusses implications for practical 
usability of the BSP model. 



2 Preliminaries 

At first we define the BSP computer. Then we present definitions of the first, 
the second, and the weak parallel machine classes. At the end of this section we 
give, without proofs, some basic technical lemmas. 
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Definition 1. The Bulk Synchronous Parallel (BSP) computer m consists 
of p processors with local memories. Every processor is a RAM with the loga- 
rithmic cost. The computation runs in supersteps. In the beginning, at most 
mm{p,0{n)} processors are active, n denotes the problem size. The input is 
spread across the local memories of initially active processors. Additional pro- 
cessors can be activated by sending messages to processors which are inactive 
till this time. During a superstep, processors do computation with locally held 
data and perform an /i-relation, i.e. they send point-to-point messages to other 
processors so that no proeessor sends nor receives more than h bits. Supersteps 
are separated by a barrier synchronization. All messages sent during a superstep 
are available in their destinations in the beginning of the next superstep. 

The time complexity of the i-th superstep is Ti = Wi-\-hig-\-l, where Wi is the 
maximum amount of computational operations in a processor, hi is the maximum 
number of bits sent or received by any processor, g is the time spent by sending 
or receiving a message, and I is the time of the barrier synchronization. The 
time complexity of the whole computation of S supersteps is T — 
define the space complexity of a BSP computer as the sum of space consumed 
by all the processors. 

We will usually assume that p is potentially unlimited, i.e. that we have 
exactly as many processors as we need for a particular computation, although 
only some of them may be active in the beginning of the computation. In such 
a case p is a function of the problem size n. Performing communication and 
synchronization is more difficult for larger number of processors. So g and I are 
not generally constants, but non-decreasing functions of p. In further text, we 
will denote a BSP computer with particular values of p, g, and I as BSP(p, g, 1). 
The linear upper limit for the number of initially active processors allows the 
input to be read in parallel. 

Definition 2. The first machine class C\ m eontains the deterministic Turing 
Machine (DTM) and all the machines which are polynomially time- equivalent 
and linearly space-equivalent to DTM. Time and space bounds need not be reached 
by the same simulation. 

Definition 3. The second machine class C 2 m is the class of computational 
devices with the time complexity polynomially equivalent to the space complexity 
of DTM. 

Definition 4. The class of weak parallel machines C^eak m contains machines 
with period (time between beginnings of processing of two subsequent inputs) 
polynomially equivalent to the space complexity of DTM. 

Examples of members of Ci are (multihead and multitape) deterministic Tu- 
ring Machines and the RAM with the logarithmic cost. The second class contains 
e.g. the alternating Turing Machine 0 and SIMDAG (also called PRAM) . 
A representative of the Cweak machines is the pipelined parallel Turing Machine 
cn. The power of weak parallel computers is in fast processing of (long) sequen- 
ces of inputs. 
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Lemma 1. Following conditions hold for membership in machine classes: 

1. A machine A4 is in Ci iff there is a machine A4 € Ci such that Ai and A4 
are polynomially time- equivalent and linearly space- equivalent. 

2. A machine AA is in C 2 iff there is a machine A4 € C 2 such that A4 and AA 
are polynomially time- equivalent. 



Lemma 2. Let there be p aetive processors in the beginning of a BSP eomputa- 
tion. Then the number of processors that can be activated in T{n) computation 
steps is bounded by . 



Lemma 3. Let a machine AA simulate a BSP computation having S supersteps. 
Assume for individual supersteps 3 c, fc > 0 Vi G {1, . . . , 5} : < c . 

Then the whole simulation takes time T-^ = O 



3 Membership in Ci and C 2 

In this section we prove membership of BSP in the first and the second machine 
classes by mutual simulations of BSP, RAM, Turing Machine, and PRAM. After 
several auxiliary lemmas. Theorems ^ 121 0 and 0] establish respective intervals 
of parameter values. Note that polynomial time and space overheads required 
for membership in C\ may be achieved by different simulations. 

Lemma 4. A BSP eomputer can simulate any deterministic Turing Machine 
with a polynomial time overhead. 

Proof (sketch). The simulating BSP runs the algorithm for simulation of DTM 
on RAM. Only one processor is used. □ 

Lemma 5. BSP and DTM can be mutually simulated with a linear space over- 
head. 

Proof (sketch). A BSP computer with only one processor is a RAM which can 
simulate a Turing Machine in linear space. 

DTM writes contents of memories of all the BSP processors to its tape. 
When simulating a step, Turing Machine sequentially performs one step of every 
processor. □ 



Lemma 6. For anyp, g, I, BSP{p, g,l) can be simulated by a Common-CRCW- 
PRAM with 0{p^) processors with a polynomial slowdown. 

Proof. PRAM uses p processors to simulate p processors of the BSP. Simulation 
of a superstep runs as follows: 
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1 . local computation - a PRAM processor directly simulates the corresponding 
BSP processor. 

2. communication ~ for any pair of processors (i,j), an area is reserved in the 
shared memory for messages from i to j. Then it is necessary to obtain a 
list of nonempty areas for every destination processor. There is a simple 
algorithm running in O(logp) time with processors. As there are p BSP 
processor, we need p^ additional processors. 

3. synchronization - one shared memory cell is reserved for synchronization. 
Each step is augmented by a constant time check of the end of superstep. 

We need the common write during the simulation to perform fast communication 
and synchronization. □ 

Lemma 7. For a potentially unlimited number of processors and g{p) = 0(1), 
l{p) = 0{l), BSP{p,gip),l{p))€C 2 . 

Proof. A BSP computation is simulated on a Common-CRCW-PRAM according 
to the previous lemma. 

An EREW-PRAM can be straightforwardly simulated on a BSP by adding 
a processor for every memory cell of the PRAM. If the PRAM has p processors 
and uses s memory cells, then BSP has p + s processors, p processors simulate 
the computation of p PRAM processors. Remaining s processors only handle 
memory requests. EREW property ensures that all communication requests are 
1-relations. □ 



Theorem 1. Letp be an arbitrary constant (independent on the input size) and 
9{p)t Kp) any functions. Then BSP{p,g{p),l{p)) sCi. 

Proof. A BSP with a fixed number of processors can be directly simulated on a 
RAM. Simulation runs in rounds. In each round, RAM simulates one superstep 
on all p BSP processors sequentially. This slows down the execution only by a 
constant factor p. □ 



Theorem 2. Let l{p) = L2{p^), for some constant b > 0. Then for any p and 
gip), BSP{p,g{p),l{p)) gCi. 

Proof. We simulate a BSP computation on a RAM. In every superstep, the 
RAM simulates sequentially all the processors of BSP. A superstep takes time 
TBSP = w + h+ f2{p'^) on BSP and = pw + ph on RAM. We analyze three 

cases: 

1. w>pkw>h=> T^AM < 2ry2 < 2 ^ 

2. p>wkp>h^ T^AM < 2 p 2 < < (r^Spy/>> ^ 

3. h>wkh>p=> T^AM < 2/i2 < 2 (tBSP)^ , 
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We have got = O each superstep. A polynomial 

time overhead for the whole computation is obtained by application of Lemma 0 
Lemma |5| gives the linear space overhead simulation. A reverse simulation of 
RAM on BSP with polynomial time overhead follows from Lemma 0| □ 

Theorem 3. Let g{p) = f2{p°‘) for some constant a > 0 and l{p) = 17(1). 
Further let us assume that at least one message is sent in each superstep of 
every BSP computation. Then BSP{p, g{p) , l{p)) G Ci. 

Proof. The only nontrivial part of the proof is the simulation of the BSP on a 
RAM. On BSP, one superstep takes time = u!-\-gh+l = w+hf2{p°') + f2{l), 
where h> 1. The RAM sequentially simulates the BSP processors. It takes time 
j’B.AM ^ + ph. Further we analyze several cases: 

1. W < p ^ jiRAM < pfi = O {hp'^'j 

a) a>2=> > [2 (hp^) > 

b) a < 2 ^ ^ ^ (-/j 2 /ap 2 \ > (hp^) > 

2. W > P ^ jiRAM £ pf^.^ + /l) & T^RAM ph 

(jBSP^ > w^ + h:^n{p^<^) 

a) a < 1/2 

i. h <w ^ yRAM £ .y ^2 _|_ p.^ ^ 2 iy‘^ < 2 (^ 7 ^bsp ^2 

ii. h>w=> < 2ph < /i(12(p“))^^“ < (/il7(p“))^/“ < 

b) a > 1/2 => + hff2{p) >vj^ + hf2{p) > 

We have got a simulation of one BSP superstep on a RAM with a polynomial 
overhead: = O ^(^ 7 ^BSP^™®-’^ 0 , 2 /a} j ^ Consequently, the whole computation 

is slowed down at most polynomially, according to Lemma 0 □ 



Theorem 4. Let us assume a BSP{p,g{p),l{p)) of time complexity T(n) = 
l7(log(n)) with g{p) = 0(log“p), l{p) = O(log^p) for some constants a,b > 0 
and potentially unlimited number of processors. Then BSP{p, g{p),l{p)) G C 2 . 

Proof. We simulate Ai = BSP(p, 1, 1) - which is in C 2 according to Lemma [ 7 |- 
running in time . Assume also that Ai runs in S supersteps. We show that the 
simulating algorithm runs on the BSP(p, g{p), l{p)) in time T^^^ = O 
for a constant A: > 0. The following inequality holds: 

s s 

rpBSP ^ + + Sl{p) < g{p)T^{n) + S{n)l{p) . 

Clearly T^{n) > S{n) and according to Lemma |3 the number of processors is 
bounded by p < Let us define c = max{a, 5}. 



J.BSP < T^(„)(iog“p + log^p) < O (r^{n) (T^(n) + log n)"") 
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The assumption T^{n) = J7(logn) yields < O . We get 

the desired constant fc = c + 1. 

The reversed simulation of BSP(p, g, 1) on M = BSP(p, 1, 1) runs clearly in 
time □ 

4 BSP Outside the First and the Second Class 

The interval of values of parameters between log^ p and has remained unex- 
amined from the previous section. The membership of BSP with such parameters 
in the classes Ci and C 2 relies on the unproved relation between the complexity 
classes P and PSPACE, resp. LINSPACeQ- We show that, under the assumption 
that there are problems with exponential time lower bound in LINSPACE, BSP 
with g, I from the above interval does not belong to either of Ci, C 2 . On the 
other hand, if P=PSPACE, then BSP S Ci = C 2 would hold trivially . 

Theorem 5. Let us suppose that there is a problem P S LINSPACE with an 
exponential time lower bound. 

Le<0 Va > OVA: > 0 : w(log^p) < g{p) < o{p^) & w(log^p) < l{p) < o(p“). 

Then BSP{p,g{p),l{p)) ^ Ci & BSP(p, g{p),l{p)) ^€ 2 - 

Proof. Assume that P can be solved on the Turing Machine in time TIME(n) = 
C(c^) and space SPACE(n) = 0{n), where c > 1 is some constant. P can be 
solved in parallel on an EREW-PRAM - and hence on BSP(p, 1,1), see the proof 
of Lemma Q- in time //-TIME(n) = 0(n*) with p = //-PROC(n) = 0(c^") 
processors (Z > 1 is a constant) using the standard transitive closure algorithm 
ynH . To achieve parallel time //-TIME(n) = O(n^) the number of processors 
must be //-PROC(n) = lV(c"), because it is necessary to perform at least the 
same number of operations as in the sequential computation. 

In the worst case, time needed for simulation of BSP(p, 1, l)-computation 
on BSP{p, g{p),l{p)) is BSP-TIME(n) = o{n^p°‘) = o{n^c^°'^) < o(c‘^“"'). This 
inequality holds Va > 0, therefore the speedup is larger than polynomial and 
hence BSP(p, g(p), /(p)) ^Ci. 

Moreover, any BSP(p, p(p), l(p)) algorithm - even just one superstep with 
any Ai-relation {h > 1) — slows down the computation more than polynomially: 
Vfc > 0 : BSP-TIME(n) = w(log^p) = w(log^c") = w(n^log*c) > ujipnf). This 
means that any BSP with enough processors has too slow communication to 
meet the polynomial time bound, hence BSP(p, p(p), l{p)) ^ C 2 . □ 

5 Relation to the Weak Parallel Machines 

The class Cweak lies between C\ and C 2 . So a question naturally arises, whether 
BSP with g and I larger than log^p and smaller than p“ could fit into Cweak- 
The following theorem gives a negative answer. 

^ class of problems solvable sequentially in linear space 
f{n) = o{g{n)) 4^ lim„^oo f{n)/g{n) = 0 
/(n) = u{g{n)) 44^ g{n)/ f{n) = 0 



2 
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Theorem 6. Let the assumptions of Theorem 0 hold. Additionally, let every 
period contain at least one superstep. Then BSP{p, g{p) ,l{p)) ^ Cweak- 

Proof. Let us take a sequence of N inputs, each having length n. The se- 
quence is processed sequentially in exponential time A'^.TIME(n) and linear space 
SPACE(n). Execution time of a pipelined parallel computation is P-TIME(n) -I- 
{N — l)PERIOD(n) where P-TIME(n) is (exponential) pipeline time comple- 
xity and PERIOD (n) is (polynomial) period. The parallel computation has to 
perform at least the same number of operations as the sequential computa- 
tion, therefore the number of processors is p > A^.TIME(n)/(P-TIME(n) -|- 
{N — l)PERIOD(n)). From the pipelined computation thesis we get the equa- 
lity PERIOD(n) = 0(SPACE^(n)) for some fc > 1. The limit lim 7 v_,ooP > 
TIME(n)/SPACE^(n) gives an exponential number of processors. It can be 
shown (in the same way as BSP ^ C 2 in the proof of Theorem EJ that the period 
grows more than polynomially when the pipelined computation is simulated on 
a BSP(p, (?(p),/(p)) computer and consequently BSP{p, g{p),l{p)) ^ Cweak- □ 

6 Conclusion 

The BSP model has been recognized as a practically usable model for general 
purpose parallel computing E]- It allows development of efficient, but portable 
and machine independent parallel programs. In this paper, we have shown the 
asymptotic power and limitations of the BSP. The computation power of the BSP 
computers can be tuned in a wide range by setting the parameters. If we allow 
for enough (exponentially many) processors and very fast communication and 
synchronization mechanisms, we get a machine from the second class, comparable 
to PRAM, see Theorem 0 The problem is that such a computer violates basic 
laws of physics such as the limited speed of light and a lower bound on the 
processor size. Using arguments from Q, it can be easily proved that in a realistic 
BSP with a large number of processors, g and I have to be I2(^/p). As we proved 
in Theorems |5| and 0 such a BSP belongs to the first class. It can still exploit 
some parallelism, but with much less speedup than the machines from C 2 . 

The members of Cweak are also physically feasible and allow fast processing 
of sequences of inputs m . Unfortunately, the BSP model does not belong to the 
weak parallel class, as has been shown in Theorem El Intuitively, in general BSP, 
this is caused by the lack of locality of communication. There is an equal distance 
between any pair of processors. This distance corresponds to the size of the com- 
puter and grows with the number of processors. On the other hand, machines 
in Cweak utilize the possibility of fast communication between near processors. 
Distant processors still communicate slowly, but direct neighbors can commu- 
nicate in constant time. We may conclude that an efficient realistic model of a 
parallel computer should include some notion of distance among its processors. 
One example is the variable-delay-PRAM |Z]. Other possibilities would be to 
allow fast communication and synchronization of BSP submachines of limited 
size or to define a neighborhood relation on BSP processors and to augment the 
standard BSP by a fast neighbor-to-neighbor communication mechanism. 
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Abstract. This paper introduces the Hypermedia Modeling Language - 
HML, which constitutes a formal basis for modeling navigation/ manipu- 
lation, synchronization and media channels handling functionality of ap- 
plications having some hypermedia features. 

We present a logical application architecture frameworkQ which forms a 
principal basis for the HML language. It defines individual hypermedia 
aspects and determines their position within the application architecture. 
Using this framework in the application development process improves 
transparency of the application architecture and leads to a higher degree 
of reuse and portability as well as to the ease of maintenance of the 
application. 

For each layer of the framework, we discuss basic principles of its mode- 
ling in the HML. 



1 Introduction 

A number of modern software applications is characterized by some hypermedia 
features (e.g. navigation or multimedia handling) supported by some technology 
(e.g. DB, OS, programming languages or GUI). However, they have no clear 
support in either the application architecture or in the “classical” software en- 
gineering application development process. Implementation of these hypermedia 
features is often mixed with the responsibilities of the business logic and the user 
interface objects (widgets). 

On the other hand, the theory of hypermedia (including hypertext and mul- 
timedia) is well developed and offers models of particular hypermedia features, 
systems and applications, e.g. Nested Context Model [5j, Dexter Hypertext Re- 
ference Model 0 , Trellis hypertext model 0 or Amsterdam Hypermedia Model 
0. The design of hypermedia application can be also supported by some of the 
existing methodologies, e.g. RMM f8l!l| . SHDT Q or OOHDM men]. However, 
the hypermedia deal mainly with the aspects of external design of applications 
and do not deal with the business logic or the domain modeling. 

This paper identifies and specifies particular aspects of “hypermediality” and 
determines their responsibilities and locations in the application architecture. 

^ “Logical” means that it deals with the logical aspects covered by an application, not 
with the physical deployment of the system described, for example, by the client- 
server architecture. 
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This principle is also reflected by our proposal of the logical application archi- 
tecture framework. Applying this framework allows us not to differentiate bet- 
ween “pure” hypermedia and non-hypermedia applications from the architecture 
point of view. We refer to both kinds simply applications. 

For navigation/manipulation, synchronization and media channels layers of 
the architecture we define their metamodels and implementation-independent 
visual modeling language Hypermedia Modeling Language - HML. 

The HML is designed as an UML Variant m- a language with well-defined 
semantics that is built on top of the UML metamodel HH. In this way the HML 
inherits all facilities of the UML and extends its complex application modeling 
by the hypermedia dimension. 

In the next section we shall outline our proposal of the logical application 
architecture framework supporting hypermedia. Sections 01 00 El and Q describe 
the domain and application, navigation/manipulation, synchronization, media 
channels models and presentation modeling principles. Finally, in the section 0 
we shall discuss the implementation of the HML in the CASE tool. 

2 Logical Application Architecture Framework 

From the logical point of view, we can identify different aspects of an application: 
domain, application logic, navigation/maniputation, media channels, presenta- 
tion and synchronization. These aspects can be arranged into layers constituting 
the framework for an application architecture, see Fig. 0 

The domain layer implements the information structure and mechanisms of 
the problem domain independently of the application’s intention. 

The application layer implements a solution of a specific problem - busin- 
ess logic, functionality and overall control of the application. It is intermediate 
between the domain layer and the user interface layer. 

The navigation/manipulation layer defines the structure of the information 
presented to an actor and controls the manipulation of domain and application 
objects. 

The media channels layer defines properties and structure of I/O devices 
used. It also defines the mapping of navigation objects onto these devices. 

The synchronization layer deals with temporal synchronization of navigation 
objects. 

The presentation layer controls presentation of navigation objects and media 
channels to an actor. 

The actor is an entity residing outside the application and communicating 
with the application. Usually it is a user or another cooperating application. 

The system represents an operating system, a database, a programming lan- 
guage or a hypermedia system, upon which the application is built. 

The implementation of an application does not need to support all layers. 
For example if application does not use multimedia data then the media channel 
or synchronization are not needed. 
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Domain | I 



System 



User 

Interface 



Fig. 1. The generic application architecture. Arrows represent relationship “uses”. 



Furthermore, this framework also clarifies the application development pro- 
cess by enabling to build each layer separately and using mainly the facilities 
offered by the underlying layers. The directions of the “uses” relationships among 
particular layers are depicted on Fig. Q] by arrows. 

Each layer is represented by its own model. In the following sections, we shall 
describe each of them in more detail. 



3 Domain and Application Model 

Software engineering provides a number of techniques for modeling the domain 
and the application layers. We shall therefore not discuss this topic here. It 
suffices to mention that within the context of the HML we shall use for this 
purpose the modeling techniques of the UML. The domain model is expressed 
mainly by static structure diagrams and by statechart diagrams. The application 
layer is modeled by use case diagrams and by realization of uses cases expressed 
by collaboration or sequence diagrams m 

To demonstrate the modeling principles described in this paper, we shall 
use a toy example of the Library Information System (LIS). It is a hypertext 
information system used for administration of the library archive, readers, book 
reservations and book borrowings. 

Fig. El shows its domain model. This model is considerably simplified; for 
instance class operations or constraints are not specified. This model constitutes 
just a part of the conceptual model. Still it suffices for our purpose. 

The LIS application aims to register readers (Reader class), books (Book 
class), current book reservations (Reservation class) and current book borro- 
wings (Borrowing class). Books are partitioned according to the area of interest 
(Area class). Areas themselves are organized into a heterarchy (directed-acyclic 
graph) based on the subarea-superarea relationship. Readers are special case of 
Persons, used also for keeping the information about book authors. 
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Person 

firstName : String 
lastName : String 



A 



author 



Reader 

address : Text 
phone : String 
birthday : Date 
idCard : String 
readeriD : integer 



Reservation 

since : Date 
tiii : Date 

1 



borrower 

I 

Borrowing 

since : Date 



{areas structure 
has to be DAG) | 



subarea 



Area 


* 




name : String 


suparea 



Book 



titie : String 
publisher : String 
abstract : Text 
nbPages : Integer 
year : Year 



Fig. 2. Simplified domain model of the LIS application 



4 Navigation and Manipulation Model 

Information and functionality contained within the domain and the application 
layers would be useless if not accessible by the actors of the application. Custo- 
mizing the access and structuring of this information according to the needs of a 
particular actor type is the responsibility of the navigation/manipulation layer. 

The navigation/manipulation layer consists of navigation objects, which are 
called views and hyperlinks in the HML. Modeling of these navigation elements as 
well as modeling of navigation contexts , accessing structures and manipulation 
is outlined in the following subsections. 



4.1 Views 

The view is an object belonging to the navigation layer with the responsibility to 
structure information and access the behavior offered by domain or application 
objects. In addition, it can implement the user interface behaviour; e.g. it can 
dynamically compute the navigation, record and show the navigation history or 
check the entry values in input forms and dialogues. 

The view class (v-class) generically describes the structure and behaviour 
of all views of the same kind. The HML denotes the v-class by the stereotype 
<Cview;^ of a class, or by the icon (as in Fig. 0 and Fig. 0 ). The view is an 
instance of a v-class. 

Each view has its owner, the domain or application object, the properties of 
which it is viewing. An owner can have many views, depending on the number 
of ways it needs to be viewed. We can perceive it as a viewing polymorphism. 
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V-class specifies, besides its own attributes and operations, also the structure 
of the information presented by the view. This information contains a collection, 
or a more complex structure, of inclusions - specifications of views which are 
included (presented) in the current view. 

The inclusion can have its own name, determining the role of an included 
view in the instance of including v-class, and the specification of included view 
which can be computed at run time. In this way it is possible to specify dynamic 
structure of views. 

For example the main view of the class Person (v-class Person~main), 
depicted in Fig. 01 collects two Person’s attributes firstName and lastName to- 
gether. The v-class Book~bibltem includes, besides another inclusions, the v- 
class Person~main - a list of book’s authors. In this case, the inclusion is spe- 
cified by the alternative way - as an aggregation with the stereotype ^hyper^. 



(a) 



(b) 




Fig. 3. (a) Reuse of the v-class Person~main in defining the v-class Book^bibltem 
(b) Instance of Book~bibltem 



Employing the inclusion mechanism combined with the template class of 
UML we can define composite v-classes representing reusable components of the 
navigation model with a possibility of parametrization. 

Taking into account the fact that v-classes are also ordinary classes, we can 
define very complex structure of the navigation model by inheritance and com- 
position relationships. 

4.2 Hyperlinks 

In the context of the HML a hyperlink constitutes a connection among views in 
the navigation layer. At the modeling level, a set of hyperlinks of the same kind is 
specified by a hyperlink association (h-association). The h-association is modeled 
by a navigable association with the stereotype <Chyper;^. The hyperlink is an 
instance of an h-association. 

Each h-association defines a set of source specifiers and a set of destination 
speeifiers. They are resolved into concrete views at run-time. 
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Additionally, an anchor is specified for each destination. It is a view, which 
is included into the source view at the place of an activation of the hyperlink. 
In most cases this is another, simpler view of a destination view’s owner. 

The HML model supports in-line (WWW-like) and out-of-line (DHM-like j^) 
links (for a more detailed description see HH]). In addition, there is a possibility 
to model by the HML n-ary hyperlinks dynamically (at run-time) computed 
hyperlinks and link contexts 0, which are not described in this paper at all. 

Fig. E] illustrates, apart from another things, also defining the set of unary 
in-line hyperlinks leading from instances of the Area~bookList v-class into the 
views of type Book~main having the anchor Book^simpleBibltem. 

In this example we illustrate both the graphical and the textual forms of 
specifying the hyperlinks which are semantically equivalent. 

4.3 Navigation Contexts and Accessing Strnctures 

The HML also provides for modeling of complex navigation contexts m and 
access structures [B|. To illustrate these possibilities of the HML on the LIS 
application, we show an example of modeling of the hierarchical index, depicted 
in Fig. 0 



(a) 



(b) 
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«hyper» » 
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Book-main views 





Fig. 4. (a) A hierarchical book index of an area - Area~bookList (b) An example of 

an Area~bookList instance 



The v-class Area~bookList represents an index of all books belonging to the 
current area of interest. All its subareas and all their books also belong into this 
area. This recursivity is realized by the inclusion of the v-class Area~bookList 
into itself. The termination of the recurrence is guaranteed by the constraint 
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specified in the domain model (Areas structure has to be a directed-acyclic 
graph). 

4.4 Manipulation 

The manipulation facility of the navigation/manipulation layer is a possibility 
to change the domain layer by operations performed on views by an actor. 

This is specified by manipulation properties attached to the whole v-class 
(valid for all its inclusions) or particularly its inclusions. The manipulation pro- 
perty specifies the operation, which can be performed with the inclusion. The 
following manipulation properties are predefined: 

in an inclusion is only input (allows to input some values) 

out an inclusion is only output (shows some values); the default behaviour 

inout an inclusion is input and output 

copy an inclusion can be copied 

paste copied view can be pasted at the inclusion’s position 
drag an inclusion can be dragged 

drop dragged view can be dropped at the inclusion’s position 



It is also possible to use additional (specific) manipulation properties in case the 
developer needs them and the implementation environment supports them. 

Each manipulation property can have a value specifying the action (message 
sending) performed when the given manipulation event occurs. 

5 Synchronization Model 

The HML offers a simple, but general and expressive enough, multimedia syn- 
chronization model based on temporal relationships. A similar principle of syn- 
chronization is used also in the system CMIFed PSl, HPAS fE] or in the W3C 
standard for the presentation of multimedia objects - SMIL HE]. 

A temporal relation, in the sense of the HML, is a stereotyped association 
among v-classes expressing a temporal ordering of their instances, possibly defi- 
ned in the context of some including v-class. There are three kinds of temporal 
relationships (UML stereotypes): SerialLink, StartSync and EndSync. Their not- 
ation and semantics is depicted in Fig. 0 The SerialLink represents a sequential 
temporal ordering with delay. The StartSync determines starting of presentati- 
ons of views at the same time with a possibility of delay. The EndSync means 
terminating the presentations of views at the same time with a possibility of 
delay. 

6 Media Channels Model 

The application contains data of different media type. Each data has to be 
presented (played) on some output device. The media channel (simply channel) 
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Fig. 5. Temporal relationships; d, p and q are delay parameters 



is an abstract input/output device for playing events. This may be, for example, 
a window or a frame on the screen, an audio or a video output. When the 
application is running, the channels are mapped onto physical input /output 
devices. 

The channel type represents all channels having the same properties, respon- 
sibilities and behaviour. It is modeled by a class defining properties of a channel, 
which are accessible to the application. These properties are expressed by attri- 
butes representing the relevant properties of physical devices (size and position 
of a window; volume, quality, balance, of an audio channel, etc.) and operations 
intended to exploit the functionality of the instantiated channels (open, play, 
close, stop the channel, etc.). 

The HML denotes the channel type by the stereotype ^channel^ of a class, 
or by the icon (as in Fig. I0|). The channel type may be a composition of others 
(e.g. a view consists of many frames) or it can share properties of the more 
general channel types by the generalization relationship. 

Fig. 0 (a) shows an example of a structure of channel types for the LIS 
application. The ApplicationWindow represents the main application window 
divided into three frames: the MenuFrame containing the menu, the IndexFrame 
(dis)playing the presentation of indices of different kinds (e.g. the book index, the 
author index, the reader index, etc.) and the Data Frame displaying the main 
views of a given domain object selected in the IndexFrame. The Workspace 
channel is instrumental in displaying all dialogue types placed on the screen. 

The static mapping of views onto a channel is specified by the dependency 
with the stereotype <Cplay;^, leading from the v-class to the channel class. Ano- 
ther kind of mapping a view to channel can be represented by the presentation 
property (e.g., of the name channel, play or target) whose value specifies the 
desired channel (it can be computed at run-time). 

An example of a channel mapping is depicted in Fig.0 (b). The main view 
of the Reader will be played on the DataFrame channel. 
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(a) 



Workspace 



ApplicationWindow 




(b) 



Reader- main 



«play» I 



DataFrame 

Fig. 6. (a) Channel hierarchy of the LIS application (b) Mapping a view to a channel 



7 Presentation Model 

The presentation model defines mapping of navigation objects and channels onto 
user interface objects (widgets) and their properties. The presentation is defined 
by attaching the presentation specifieation, represented by a set of property 
specifications (in the sense of the UML), to the whole view, to a particular 
inclusion of a view, to link anchors and targets or to a channel. 

The presentation specification depends on the target implementation envi- 
ronment to a large extent, therefore the set of predefined presentation properties 
and the specification of the syntax and the semantics of their values are out of 
scope of the HML. 

8 HML Implementation 

The fact the HML is an UML variant and many modeling constructors use the 
UML extension mechanisms implies, that an implementation of the software 
support for HML modeling process can be realized by modifying some existing 
CASE tool supporting UML and its extension mechanisms. We have chosen 
Rational Rose 98 [HI, which would best fit our requirements. 

We have already implemented the core of the navigation, the synchroniza- 
tion and the media channels modeling. Now (August 1998), we are working on 
implementation of the model management functionality ( e.g. browsing of HML 
elements or finding model inconsistencies) and implementation of the code ge- 
nerator into HTML language supported by JavaScript. 

9 Conclusions 

We have presented a brief introduction to the Hypermedia Modeling Language 
- HML. The HML language concerns modeling of the navigation/manipulation, 
the synchronization and the media channels handling functionality of applicati- 
ons with such hypermedia features. 

Detailed description of the syntax, the semantics and the notation can not be 
presented within the size of this paper. More detailed description of the HML lan- 
guage will appear in the following forthcoming documents: the HML Summary 
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introducing the background of the language, the HML Semantics defining the 
semantics and abstract syntax by the HML Metamodel and the HML Notation 
describing the concrete graphical syntax of HML. 

Our future work will concern refining the definition of the navigation and in- 
terface design patterns j2] using the HML. We have already defined some naviga- 
tion patterns of simple access structures (such as index, guided tour or indexed- 
guided tour) . Therefore the HML language also appears to be of significant help 
in the area of hypermedia pattern languages. 
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Abstract. The Klee’s measure problem is to compute the volume of the 
union of a given set of n isothetic boxes in a d-dimensional space. The 
fastest currently known algorithm for this problem, developed by Over- 
mars and Yap |0], runs in time logu). We present an alternative 

simple approach with the same asymptotic performance. The exposition 
is restricted to dimensions three and four. 



1 Introduction 

The measure problem for a d-dimensional Euclidean space, as proposed by Klee 
in |S|, is defined as follows: there is given a collection of n isothetic boxes (their 
edges are parallel to the coordinate axes) and the task is to compute the measure 
of their union. Klee jS| gave an 0(nlog n)-time algorithm for the case d = 1, 
and later Fredman and Weide P] proved that this time bound is optimal in 
the algebraic computation tree model. For d = 2 an 0(nlogn)-time algorithm 
was given by Bentley Q. A straightforward extension of his approach leads to 
an algorithm that for arbitrary d > 3 achieves time performance logn). 

Van Leeuwen and Wood Pj showed how to decrease it to Subsequently 

this was significantly improved by Overmars and Yap pj to the time bound 
0{n‘^^^logn), the fastest algorithm known so far for d > 3. The question of 
optimality of computing the volume of a set of isothetic boxes in the Euclidean 
space of dimension d > 2 still remains an open problem. 

We present another approach to the measure problem, which is conceptually 
simpler. Only the special cases of dimension three and four are described, for 
which algorithms are developed operating in time log n) and 0{n^ log n), 

respectively. Generalizing for higher dimensions d, one can obtain algorithms 
operating in time -log n), thus matching the performance of the algorithm 

of Overmars and Yap ||. This will be presented in the final version of this paper. 

A natural approach to the measure problem is to use the plane-sweep techni- 
que (cf. Pj), as follows. First select one of the d coordinates, say x. Consider the 

* This work was supported by the contract KBN 8 TllC 036 14. 
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set X of all a:-coordinates of the corners of all boxes. Sort X in the non-decreasing 
order, thus creating the event-point schedule. Next, view the algorithm as mo- 
ving a (d — l)-dimensional hyperplane perpendicular to the x-axis through all 
the consecutive points in X, maintaining the sweep-plane status, in our case the 
measure of the intersection of the boxes with the hyperplane. We will refer 
to Kx as a cut. Notice that between two consecutive events Xi,X 2 € X, that is, 
for X € (xi,X 2 ), the measure of the cut is constant. Therefore the interval 
(xi,X 2 ) contributes (x 2 — xi) • Measure(Kx) to the total volume, where x is an 
arbitrary point from (xi,X 2 ). Adding these terms together we obtain the total 
measure of the union of the boxes. 

Each event x G X is a coordinate of a side of some box, which means that, 
during the hyperplane sweep, when it passes x, we need to either add a (d— 1)- 
dimensional box to the cut or remove such a box from K^. We have 2n 
such events, so the total time complexity will be bounded by 0{n ■ t), where t 
is a bound on the time needed to update the measure of Kj,. In this way the 
d-dimensional measure problem reduces to the problem of maintaining the total 
measure of a varying set of boxes in (d— l)-dimensional space. More specifically, 
we need to design a data structure that implements the following operations on 
(d — l)-dimensional boxes: 

Insert: Add a box to ; 

Delete: Remove a box from Kx ; 

Measure: What is the current (d — l)-dimensional measure of Kx^ 

Our algorithm follows this approach, the novelty being in the way the sweep- 
plane status is maintained: to this end we use a variant of a quadtree. A quadtree 
is a data structure used extensively in computational geometry and graphics, it 
was introduced by Finkel and Bentley Pj for solving range searching problems. 
More information and references on this and other multidimensional data struc- 
tures can be found in m A d-dimensional quadtree is a composite data struc- 
ture consisting of the following components: First, there is a skeleton tree, second, 
there are d-dimensional boxes (7„ assigned to each node v, called cells, and third, 
there are some additional data structures assigned to each node. Usually all the 
boxes assigned to the nodes on the same level are a partition of the box assigned 
to the root. 



2 Dimension Three 

The solution of Overmars and Yap |E| of the measure problem, specialized to 
dimension three, yields a version of 2-D trees to maintain the area of rectangles. 
These trees have 0{^/ri) insertion and deletion time. Below we describe another 
version of the quadtree, that we call trellis quadtree, which also has 0{^/n) time 
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bound for insertions and deletions. Trellis quadtree is conceptually simpler than 
the original construction of Overmars and Yap for handling rectangles. 

Suppose 4n points in the plane are given, the vertices of a set of n input 
rectangles. The following process defines the skeleton tree and the rectangles 
assigned to each node. Take a rectangle including all the points and divide it 
into quadrants by drawing horizontal and vertical lines through the median x- 
and ^-coordinates. Repeat this for each quadrant recursively, but taking medians 
of the internal points only. The empty rectangles obtained eventually become 
assigned to the leaves. 

The data structures assigned to a node depend on its kind. For each inner 
node there is just a counter assigned to it which stores the number of input 
rectangles completely covering the assigned rectangle. Note that there is no 
rectangle vertex in a cell assigned to a leaf but there may be horizontal and 
vertical strips intersecting it. To handle their area we maintain two segment 
trees (cf. Q) at each leaf, as follows. Consider a cell assigned to a leaf. Let 
V be the set of vertical strips crossing Cy and obtained as intersections of the 
input rectangles with Cy. Consider the set of intervals which are projections of 
these strips onto the x-axis. The area of the union of the elements of V can be 
computed from the measure of the union of their projections as 

Area(}^V) = length{projx{\_JV)) • length{projy{Cy)). 

The intervals can be stored in a segment tree so that this parameter can be 
maintained with 0(log n)-time insertions and deletions. A similar situation holds 
for the set JA of horizontal strips. The area of the intersection H C U^) 

is given by 

Area([J H n[JV) = length{projx{[J R)) • length{projy{[J H)). 

Now Area(lJ H U[JV) can be calculated from the inclusion-exclusion principle 
formula: 

Area{\^ H u[^V) = Area(\^ H) + Area(\^ V) — Area(\^ C Y). 

We say that a line intersects a node (or intersects the rectangle associated with 
a node) if it intersects the interior of the rectangle associated with the node. 

Lemma 1. A horizontal or vertical line intersects at most V2(l + V^) • i/n 
leaves of the trellis quadtree built for n rectangles. 

Proof. Consider some k points in the plane. Build the skeleton tree and as- 
sign rectangles to the nodes, following the principle of construction of the trellis 
quadtree. Let T(fc) denote the maximum number of leaves intersected by a ver- 
tical line L, taken over all configurations of k points. Let S be the cell of the 
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root. It is crossed by a vertical line My passing through the point with the me- 
dian cc-coordinate, and by a horizontal line passing through the point with 
the median y-coordinate. The lines Mh and My together partition S into four 
rectangles, of which at most two are intersected by L. If the rectangles inters- 
ected by L contain ki and ^2 points respectively, where ki + k 2 < , then L 

intersects at most T{ki) -I- T{k 2 ) leaves. Hence the function T(k) is determined 
by the following recurrence equation: 



T{k) = 



1 



if /c = 0 ; 
i)) ii k > 0 . 



We prove by induction that the inequality 



T{k) <{1 + V2)-Vk 



( 1 ) 



holds for fc > 0. 

The base of induction: Let fc = 1, then T(l) = 2 • T(0) = 2 < 1 -|- \[2. 
The inductive step: Suppose first that \k/2\ > i > 0. Then 



r(*)-hr( 



fc- 1 



J - < (1 + V2) (^Vi + 

< (1 -I- V2) ■ Vk . 



2 * 



The remaining case is when j = 0. Then 

7’(0) + t([^J) <! + (! + 72)^1 
< {l + V2)Vk , 



for fc > 1 (it is here that the specific value 1 -I- is used). To complete the 
proof, take k = 2n and substitute in inequality ^ n 



The overall time to find the measure of n boxes in 3-dimensional space using 
trellis quadtrees is as stated in Theorem QJ and is the same as that of algorithm 
of Overmars and Yap 0. 

Theorem 1. The measure of a union of n 3-dimensional boxes is computed in 
time 0(n^/^logn) by the plane-sweep algorithm, if the trellis quadtree is used to 
maintain the area of intersection of the sweeping plane with the input boxes. 



Proof. First we construct the trellis quadtree. Two copies of each vertex are crea- 
ted, one in an array Ay sorted on the y-coordinates and another in an array 
sorted on the z-coordinates. The median of the y-coordinates is the y-coordinate 
of the point stored in the middle of Ay, similarly the median of z-coordinates is 
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the z-coordinate of the point in the middle of A^. Partition the points into four 
groups corresponding to four quadrants, in each group storing two copies of a 
point: one in an array sorted on the y- and in the other on the z-coordinates. The 
minimum and maximum coordinates of the points in a group are the coordinates 
of the rectangular region associated with the node. Proceed each node recursi- 
vely, partitioning the points into four groups and assigning rectangular regions 
to each node. A node becomes a leaf if no vertex of an input rectangle is located 
in the interior of the region assigned to the node. This completes the construc- 
tion of the skeleton tree. Its depth is O(logn) and processing a level takes time 
0{n), so the time to build the skeleton tree is O(nlogn). For each input rec- 
tangle scan the leaves that it intersects: the intersection is a strip across the cell 
associated with the leaf, insert the interval being a projection of the strip along 
its length into the segment tree corresponding to the coordinate of projection. 
Every rectangle intersects 0(-\/n) leaves by Lemma d and each leaf is located 
and its segment tree updated in time 0{logn). The total time to construct the 
tree is thus 0(n^/^logn). 

Next perform the required sequence of insertions and deletions. Each rec- 
tangle intersects 0{^/n) leaves by Lemma d and each leaf is located and its 
parameters updated in time O(logn). Therefore the time to process all the bo- 
xes is 0(n^/^logn). □ 

The bound of Theorem dis best possible: it is attained for a set of rectangles 
with the vertices sufficiently evenly distributed over the input region. The trellis 
quadtree occupies space however the algorithm can be modified to 

operate in space 0{n) and within the same time bounds as follows (Overmars 
and Yap d also showed how to implement their algorithm so that it runs in the 
same time and simultaneously in linear space). The idea is not to first build the 
tree and then process all the input boxes, but rather construct the leaves one 
by one and compute the total volume contributed by each leaf, discarding each 
processed leaf. This corresponds to visiting the nodes of the trellis quadtree 
depth-first like rather than breadth-first like: when the tree is built as in the 
proof of Theorem d then this is done level by level, that is breadth-first like, 
alternately, we could store a path to the processed leaf on a stack, that is, scan 
the nodes depth- first like. 

3 Dimension Four 

The trellis quadtrees generalize to higher dimensions, and in this way we obtain 
another deterministic solution of the general Klee’s measure problem. In this 
section we consider the case of dimension four. In our framework, the goal is 
to maintain the volume of the union of 3-D parallelepipeds, again we call them 
boxes for brevity. 
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Suppose 6n points in the 3-dimensional space are given, determined by the 
vertices of a set of n input boxes. The following process defines the skeleton tree 
and the boxes assigned to each node. Take a box B including all the points in 
its interior, this is to be the region assigned to the root. The points are in three 
copies each, stored in arrays sorted on the x-, y-, and ^-coordinates, respectively. 
Partition B into eight parts by the planes parallel to pairs of coordinate axes 
and passing through the median x-, y- and ^-coordinates of the points in B. 
Next assign the points to the obtained parts of B. Suppose a part of B is the 
following: P = [x\,X 2 ] X [y\,y 2 ] x [zi,Z 2 ]. A point {x,y,z) is assigned to the 
node of P if and only if more than one among the inequalities x\ < x < X 2 and 
yi < y < y 2 and Zi < z < Z2 holds. Notice that a point can be assigned to four 
parts of B during this process. This creates the second level of nodes of the tree. 
The process is repeated recursively for each node, a node is partitioned into eight 
parts by the planes passing through the medians of some of the coordinates of 
the points assigned to the node; say in the case of coordinate cc, when [a, b] is the 
x-edge of the region assigned to the node, then we take the median among all 
the points with the x coordinates satisfying a < x < b, counting multiplicities. 
A node becomes a leaf if no point is assigned to it. The rule of assigning points 
to nodes makes it possible to apply the inclusion-exclusion formula in order to 
compute the volume of the sum of the boxes intersecting the region assigned to 
a leaf. This is because the intersection of a box with the region assigned to a leaf 
is a slice of the region bordered by two planes. To be able to use this formula, we 
need to store the intervals being the projections of the slices on the coordinate 
axe perpendicular to the slice, to this end we use again interval trees, one for 
each of the coordinates x, y, and z. The remaining details of the trellis trees 
handling insertions and deletions of 3-dimensional boxes are similar as in the 
case of 2-dimensional rectangles. 

The performance of the tree can be estimated by bounding the number of 
leaf cells intersected by a plane parallel to a pair of coordinate axes, analogously 
to Lemma Q] Consider n points inside or on the sides of a 3-D box. Let rij, for 
1 < i < 3, be the number of projections of points on the interior of the i-th 
side (as depicted on Figure DJ. The box is cut by planes parallel to pairs of 
coordinate axes, and the sides are partitioned into quadrants. The projections 
on the sides are partitioned accordingly. The number of (projections of) points 
in the quadrants of the sides are denoted by letters a, b and c with indices, see 
Figure n Consider a plane parallel to two coordinate axes and crossing the box. 
We are interested in the number of leaf cells intersected by the plane. To be 
specific, let the plane intersect the sides of the box marked with ci, 02 , Ci, C 2 , 
its intersection with the box on Figured is shown with dashed lines. 

Consider the numbers a^. They satisfy the following dependencies: 

ai + 02 = «3 + 04 = Oi -I- 03 = 02 -I- 04 = 



9 
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Fig. 1. A 3-D box crossed by a plane 



where we assume that n\ is odd, to avoid the floor notation; later we assume 
the same about ri2 and n^. Let ai = a, then 04 = a and 02 = 03 = ~ 

If the numbers b\ and ci are denoted as b and c, respectively, then similar 
dependencies between bi and Ci hold, and we obtain 64 = 6, 62 = ^3 = ~ 

C4 = c, C2 = C3 = ^^2^ ~ T{ni^n2,n^) denote the maximum number 

of leaf cells intersected by a plane crossing a box, where rii are the numbers 
of projections of points on the interiors of three mutually orthogonal sides. We 
prove by induction that the inequality 



T(ni, ri2, 713) < 6(ni +U2 + n^) 



holds, for ni -I- ri2 -I- ri3 > 0 . Indeed, with the notation as in Figure 0 we obtain 



T(m,n2,n3) < T(ai,6i,ci) -b T(ai, &3, C2) -b T(o2, 62, Ci) -b T(a2, 64, C2) 
< T(a, 6 , c) -b T Y - 6 , Y - c) 

, u \ , rj. ( ni n.3 \ 



/ U2 713 

< 6 (a -b 6 -bc-ba-b-^ ^ 

U2 



I I ^^2 , , ni , , , ^3 N 



= 6 (ni -b U 2 -b ns) . 



The last inequality is correct provided no triple of the arguments of T consists 
of three zeros, because T( 0 , 0 , 0 ) = 1 > 0 -b 0 -b 0 . If this happens then the worst 
case is when some three occurrences of T(. . . ) are of the form T( 0 , 0 , 0 ). Then 
we can estimate: 



T(ni,n2,n3) < 3 + r(^y, y, yj 

<3 + 6 . 51 ±^±^ 

< 6 (ni -b ri 2 -b ns) . 
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This is for the case as in Figure D the other cases are similar. We obtain, as a 
corollary, that if there are n points inside the box then the number of leaf cells 
intersected by a plane is at most 18 • n. 

Theorem 2. The measure of a union of n 4-dimensional boxes is computed 
in time 0{n^logn) by the plane-sweep algorithm, if the 3-dimensional trellis 
quadtree is used to maintain the volume of the intersection of the sweeping hy- 
perplane with the input boxes. 

Proof. The situation is similar as in the proof of Theorem H Since a plane may 
intersect up to 0{n) leaves, as proved above, the size of the tree is 0{n^). It can 
be built in time 0{n^logn). The sweeping hyperplane status is maintained by 
inserting and deleting 3-dimensional boxes and computing the volume of their 
union. A single insertion or deletion is done by processing 0{n) leaves, each in 
time O(logn). The number of these operations is 0{n), hence the total time is 
0{n^ - log n). □ 

The bound of Theorem |3 is the best possible, and the algorithm can be 
implemented to run in linear space and within the same time bound. 

Acknowledgement. Thanks are due to Marek Chrobak and Larry Larmore 
for encouraging the author to write this report, and for their criticism of the 
preliminary versions. 
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Abstract. In this paper we present a general and still flexible modular 
technique for the design of efficient leader election algorithms in A-node 
networks. Our approach can be viewed as a generalization of the previous 
method introduced by Korach, Kutten and Moran |7|. We show how well- 
known 0{N) message leader election algorithms in oriented hypercubes 
and tori lll^lllllblltil can be derived by our technique. This is in contrast 
with n{NlogN) message lower bound for the approach in 0. 
Moreover, our technique can be used to design new linear leader election 
algorithms for unoriented butterflies and cube connected cycles, thus de- 
monstrating its usefulness. This is an improvement over the 0{N log N) 
solutions obtained from the general leader election algorithm 0. These 
results are of interest, since tori and corresponding chordal rings were 
the only known symmetric topologies for which linear leader election 
algorithms in unoriented case were known 11 11151 . 



1 Introduction 

One of the fundamental problems in distributed systems is the problem of Leader 
Election] that is, the problem of transforming the system from an initial conDgu- 
ration where the processors are in the same initial state, to a Dual conDguration 
where exactly one processor is in the state leader and all other processors are 
in the state defeated. The leader election is widely used in distributed compu- 
ting, mainly in situations where one processor is required to act as a central 
coordinator, e.g. as a part of a reinitialization or recovery procedure. 

There is a number of leader election algorithms, working on arbitrary net- 
works 0 as well as on special topologies: rings [UUbj . complete graphs |?Sll()j . 
chordal rings | 2 | , tori and hypercubes UBm- Much attention has been devo- 

ted to diDerent variants of the computational model, depending on the structural 
information about the underlying network topology available at the processors. 
It has been shown that more eD dent solutions exist for special topologies rather 

* The research was partially supported by EU Grant No. INCO-COP 96-0195 
’’ALTEC- KIT” and by the Slovak VEGA project 1/4315/97. 
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than for arbitrary ones (with or without additional structural information, e.g. 
sense of orientation). _ 

A general modular technique has been presented in*^], using a traversal 
algorithm as a building block for leader election. This technique can be applied 
to arbitrary networks and for some special topologies (e.g. conmlete graphs) 
it yields better results than the general leader election algorithm^fe] . However, 
for sparse A^-node networks (with 0(N log N) edges) it is not better than the 
general algorithm. 



We propose a modular technique, which overcomes this we. 



.aknpss Our app- 
roach can be seeo_aa,a generalization of the previous results fronT"f^], exploiting 
the ideas from* [12,1b]. Designing eD cient leader election algorithms for speciDc 
topologies can be pretty hard task, thus general and still Dexible techniques can 
be helpful. 

We document the last point by using our approach in the case of unorien- 



ted 

fron 



biitt.pr n 



Dies and cube connected cycles. Applications of the general techniques 
lead to non-satisfactory 0{N log N) solutions. No better results have 
been known for these topologies (even in the oriented case) . We use our technique 
to design 0{N) leader election algorithms on these topologies, showing that to- 
pological awareness alone is suD cient to decrease the communication complexity 
to 0(N) messages. 

The paper is organized as follows. In Section 2, we introduce necessary preli- 
minaries. In Section 3, we present our modular technique and in Section 4 we 
apply it to wrapped butterDies and cube connected cycles. 



2 Preliminaries 

The com p utational model is a standard model of asynchronous distributed com- 
puting^fw] . Every message will be delivered in a Dnite but unbounded time. 
FIFO requirements on links are not necessary. All processors are identical and 
run the same algorithm. 

The underlying communication topology is represented by undirected graph 
G = {V, E), where vertices represent processors and edges represent bidirectional 
communication channels between them. We will use N for \V\. 

We consider non-anonymous networks. That means that each processor has 
unique identiDcation number (id) from some totally ordered set ID. These iden- 
tities are without topological signiDcance. Aloreover, each processor knows only 
its own identity. 

Solving the problem of Leader Election means that starting from an initial 
conDguration where all processors are in the same state (the only diDerence 
being diDerent ids), the system should reach a conDguration where exactly one 
processor is in the state leader and all other processors are in the state defeated. 
The computation starts spontaneously in some non-empty subset of processors, 
the remaining processors join the computation after receiving the Drst message. 

We are interested in communication complexity, expressed by the maximum 
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id used) sent by the algorithm over all possible executions. There are many 
possible executions, due to the communication asynchronicity or the unknown 
subset of starting processors. We will use the notion circumstances O for a given 
Dxed pattern of message delays and starting set of processors. If the algorithm 
is deterministic, for given O it has only one execution. 

Wrapped butterDies and cube connected cycles. The n-dimensional 
wrapped butterDy can be represented as a graph BFn = (V,E), where V = 
{(i, c)|0 D i < 2", 0 D c < n} and (i, c), (j, c') are connected iD c' = c D 1 mod n 
and i = j or i and j diDer in bit position min(c, c'). By c-colunm we mean the 
set {(i, c)|0 D i < 2"}. 

The n-dimensional cube connected cycles is a graph CCCn = (V,E), where 
V is as before and (i, c) and (j, c') are connected iD c' = c D 1 mod n and i = j 
(cycle edges) or c = d and i and j diDer in bit position c (hypercube edge). 
Columns of the butterDy correspond to cycles created of cycle edges only. 

3 The Synchronization Technique 

The main idea of our technique comes from the observation that eD ciency of lea- 
der election algorithms on oriented hypercubes and tori im is due to the fact 
that on these topologies an active processor can claim large territory by marking 
signiDcantly smaller amount of vertices. Algorithms of Tel and Peterson work in 
stages and use ad-hoc technique to synchronize the computation. We propose a 
general synchronization technique which uses arbitrary marking algorithm A as 
a building block. 

Let A he a deterministic distribnted algorithm initiated at a single vertex, 
with an integer parameter (later interpreted as the stage number), and with an 
explicit termination. 

DeD nition 1. 

— denotes A invoked at vertex v with a parameter i, running under cir- 
cumstances O. 

— 'R_^o = {u\u is reached by A'^j} 

— (Collision Set) CS^o = {u\TZ^o 0} 

— (Size of Collision Set) CSai = min„ o 

— = max„^c)(^# of messages used by A^ f) 

When A is clear from the context, we will omit it and use shorter terms 72.® j, 
CSi and /i, respectively. 

In the following description of the algorithm we will omit O indices. All 
references will be assumed under the circumstances under which this particular 
execution of the algorithm runs. 

The election algorithm £ works in stages. During each stage, actions are 
initiated by processors active in this stage, resulting in the extinction of some 
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active processors. At the beginning, spontaneously awaked processors are the 
only active ones. If a processor is awaken by some incoming message, it will 
become passive for the rest of the computation. Stages are repeated until either 
only one processor - the leader - remains active or stages cannot be performed 
eD ciently any more. In the second case, another algorithm is launched afterwards 
to choose the leader from the remaining active processors. 

Let V be an active processor in a stage i. Activity launched by v at stage i 
consists of three phases: 

1. (forward phase) Av^i is launched, with suD x (u, f, forward) added to each 
of its messages. (This suD x is ignored by A). During this phase, the spanning 
tree 7),^^ of rooted at v is built. 

2. (backward phase) Reversed computation proceeds from the leaves of 
backward to v. It is clear when to start, because A terminates explicitly. 
During the backward phase collisions are tested. When it is Dnished, v can 
tell whether it survives stage i. The processor v survives only if no collision 
with stronger processor occured (here the suD xes added in the forward phase 
are used). A processor is stronger if it is in higher stage or it is in the same 
stage, but with higher id. 

3. (acknowledgement phase) is used to broadcast on TZ^^i whether v 
survived the stage i. ((u, t, ack, 1), (u,i,ack, 0) are used to signal survived, 
killed, respectively) 

The acknowledgement phase is needed to allow deadlock-free implementation 
of the backward phase. 

If uses less than 2|7^®J messages and ^ does not depend on 0, it is 
more eD cient to use Av.% in backward and acknowledgement phase, thus saving 
the construction of the spanning tree of TZ^ 

There are some technical diD culties in the backward phase due to the asyn- 
chronicity, so we will discuss this phase in more detail. 

Backward phase at a leaf u of 

Test{v,i); 

Backward phase at a non-leaf vertex u of 7), 

Wait for messages (u, i, backward, x) from all your sons in 
Compute y as and of all xD; Delds of these messages; 
if 2 / = 0 then Send (u, i. backward, 0) to your father 
else if u u then Test(v, i) 

else broadcast (v, i, ack, y) on Tv^i 

Procedure Test(u,t) at a vertex u: 

Let (w, j) be the maximal pair that u has seen, not taking into account (v, i). 
If (w, j, ack, I) has been received, then take (w,j + I) instead of (w, j). Let 
(w^, j^) be the maximal pair such that no ack message for it has been received. 
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if {w,j) > (v, i) then Send(u, i, backward, 0) to your father 
else if + 1) < (v,i) then Send(u, i, backward, 1) to your father 

else Wait for receiving {w' , j' , ack, x) for all {w\j') such that 

{w' ,j' + 1) > (u,t) and no ack message has been received for (w',/). 
Compute y as and of all -ia:[i Delds of these ack messages. 

Send(u, i, backward, y) to your father: 

Lemma 1. The algorithm £ is deadlock-free. 

Proof. The waiting for actions of some other active processor occurs only in 
the procedure Test(). Here {v,i) waits for only if {w',j') < (v,i) (and 

{w',f + 1) > (v,i)). (If > {v,i) then also (w,j) > {v,i) and no waiting 

is induced.) That means that in the oriented dependency graph of the waiting 
relation each edge leads to the weaker vertex, thus no cycle can occur. 

Lemma 2. There always remains at least one active processor. 

Proof. The strongest processor (with the highest {w,j)) can be beaten only by 
a (slightly weaker) surviving processor. 

Lemma 3. Let u and v be two processors active at stage i (in the execution of 
£ under circumstances O). If u € CS^ i then at most one of them is active at 
stage i + 1. 

Proof. W.l.o.g. assume u < v. (As u € CS^.i implies v € CS^fj. If 3w G 
7^® j n such that it received a message {v, i, forward) before receiving (u, i, 
forward), then u will not survive because of the Drst test in the procedure Test(u, 
i) at the vertex w. If there is no such vertex and u survived the stage i, then v 
will not survive, because it will wait for ack messages for (m, i) in some vertices 
of n These ack messages will eventually arrive (because u survives) 
and kill v. 

Corollary 1. For i D 1, there are at most processors active during 

the stage i, where CSq = 1. 

Proposition 1. Let the algorithm £ run up to k stages, then 

1. there are at most \_N/CSk\ active processors, 

2. 0{N fi/CSi(^i) messages are used. 

Proof. Straightforward consequence of the Corollary Q and the deDnitions of fi 
and CSi. 

Our construction shows how to reduce the problem of designing a leader 
election algorithm to the problem of choosing a suitable marking algorithm A 
such that A is eD cient (ff^ is preferablv low), but it can claim a large territory 
(C5)^ is high). 
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4 Relationship to Known Results 

In this section we show how some well-known algorithms for leader election can 
be viewed as optimized instances of our technique. The algorithm £ is rather 
clumsy in comparison with these approaches, because it does not exploit pro- 
perties of the underlying marking algorithm A (e.g. A being traversal or the fact 
that and TZu.i may greatly overlap). However, such optimizations improve 
only the constant factor, but they do not inDuence the asymptotical behaviour. 

Modular technique by Korach, Kutten and Moran p[[. Our technique 
works diDerent than the construction from [Zj, but reaches the same complexity: 

Take Av,i to be a traversal algorithm that reaches 2* vertices. Trivially 
csZ 0 so CSi D 2®. Following arguments from [Tj we get the leader 

election algorithm with the same asymptotical bound on the communication. 

Leader election on oriented hypercubes |16|. Choose Av,i to be two 
broadcasts: (a) on the sub-hypercnbe spanning the Drst [*/2j dimensions from 
V and (b) on the sub-hypercube spanning dimensions [*/2j -b 1, . . . ,* from v. 
We get fi = 0(2®/^), CSi = 0(2®) (the sub-hypercube spanning the Drst i 
dimensions from v) and the number of stages k = log N, resulting in the overall 
communication 0(N l/2'O) = 0{N). 

Leader election on (unoriented) tori 2 1 5l1 1 j . Let Av,i be an algo- 
rithm that marks the boundary of a square of side D® for some D > 1 (u is a 
corner of this square). This can be done using 0(D ®) messages even on unoriented 
tori (see |L®)I11] 1. thus fi = 0(D®). CSi = D^®, since there is no vertex inside the 
marked square that can mark boundary of its square of the same size without 
crossing the boundary of the surrounding square, k is set to logg (min(ni, 712 )), 
where ri\ and U 2 are sizes of the torus. Following Proposition [H we get 0{N) 
messages for log(ni/n 2 ) € 0(1). In case of less balanced tori, special termina- 
tion phase is needed to choose a leader from the remaining active processors (the 
same as in the original algorithm fT2]L 



5 New Results 

We combine proposed technique with computing preorientation [ 1 01 1 I j and mat- 
ching technique for hypercubes CSI, to achieve linear leader election algorithms 
for unoriened wrapped butterDies and cube connected cycles. No previous results 
for these topologies were known and bv application of general techniques from 
m we get 0{N log N) solutions. 



5.1 Leader Election on LFnoriented Wrapped ButterDies 

Computing preorientation In the butterDy there are just two cycles of length 
4 passing through a given vertex, and vertices of each of these cycles lie in the 
same direction (see Figure 1). This means that after a local precomputation up 
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V 




to distance 2. each vertex can divide its 4 links into two groups - the forward 
and the backward links. 

There is no global knowledge of what is forward and what is backward direc- 
tion. however our algorithm does not need it. Note that computing preorientati- 
ons can be done in 0{N) messages, since the degree of the butterDy is constant 
and we need to look up to constant distance only. 

The marking algorithm Ai (Closely resembles Ai for oriented hypercubes.) 

Fill(i, forward); 

FarFill(i, forward); 

Fill(i, backward); {needed, because there is no global consistency} 

FarFill(i, backward); {of what is forward and what backward} 

Procedure Fill(t, direction) 

At initiator: 

Send (Dll, i) to both neighbours in the desired direction; 

Upon receiving (Dll, i): 

if i > 1 then Send (Dlk i ® 1) on both links in the opposite direction; 



Procedure FarFill(i, direction) 

At initiator: 

Choose one neighbour in the desired direction and send him 
(farDll, i, 2i) message; 

Upon receiving (farDll, i, j): 

if j = 1 then launch Fill(i) in the direction from which you received 
this message. 

else Send (farDll. i, j ® 1) to one of the neighbours in the opposite 
direction. 

Note that Ai is independent of circumstances, so we can omit the index O. 

Lemma 4. 

1- .h e 0(2*) 

A CS, D 22* for OU i< nj2 

3. CSi D (2i ® n)2" for n/2 D i D n 
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Proof. 1. The complexity of Ai is 2 DFill + 2 DFarFill = 4f + 4 DFill = 

= 4f + 4(2*+i ® 1) e 6(2*). 

2. We will prove the following statement. Let u = (x,c), v = {y, c). li x and y 
diDer only in the substring of 2i consecutive bits starting from the position 
c + 1, then TZy^i D TZu,i 0 (u G CSy^t). This proves (2), since for each u 
there are 2^* such v. 

Consider a vertex v' from which the procedure Fill(f, _ ) was launched from 
FarFill(). (There are two such v' for each v - take one of them.) v' may diDer 
from u only in 2i bits starting from the position c+ 1. All possibilities of the 
Drst i bits are tried in the last layer marked by Fill(i, _ ) launched from u to 
the direction v' lies in. All possibilities of the last i bits (of these 2i bits they 
may diDer) are tried at the layer of vertices marked by Fill(i, ) launched 
from v' . So, there must be a vertex at the column (c + i) mod n which lies 
in TZy,i n Uu,i- 

3. We will prove the following statement. Let u = {x, c'), v = {y, c). If u lies in 

some of the 2i® n columns starting from c, then u G CSy^i. Since there are 
2" vertices at each column, this shows that CSy^i D {2i ® n)2" for each v. 
The proof follows as in the previous case. Let v' be as before, u and v' 
may now diDer in all n bits. All vertices that diDer from u only in bits 
c' + 1, . . . , c' + I are marked by an invocation of Fill(i, _ ) from u. Vertices 
that diDer from v' only in bits c + i, . . . ,c + 2i are marked by the invocation 
of Fill(j, _ ) from v' . Since 2i > N and c' is between c and 2i + c, there is 
u) G TZy.i C Ti-u,i iir each of the columns c + i, . . . + i. 



To complete the description of the algorithm, we must ensure that further 
stages are not needed. This can be done in the following way. If a vertex v is 
reached by a DH message (messages from Fill() launched from FarFill() are not 
counted) that was initiated by it, then v knows that it is the only surviving 
processor and broadcasts its identity. 

The overall complexity can be computed following Proposition H Lem- 
ma 0 




^ (2i®n)2^ 

i=Ln/2j-|-l ^ ' 



= 0{N) 



Since the Dual broadcasting is trivially 0{N), we have 



Proposition 2. The leader election algorithm £ on unoriented N -node butter- 
Dzes u.ses 0(N) messages. 



5.2 Leader Election on Unoriented Cube Connected Cycles 

The algorithm for CCC simulates the algorithm for the wrapped butterDy. Since 
the wrapped butterDy can be embedded in CCC with dilation 2 [2|, the comple- 
xity of the resulting algorithm is again 0(N). 

All what is needed to show is how to simulate a preoriented butterDy on 
an unoriented cube connected cycles of the same dimension. First, we preorient 
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CCCn- This can be done in a similar way as in the preorientation of the n- 
dimensional bntterDy. We use the fact that for n > 8 there are exactly two circles 
of length 8 passing through a given vertex. The edge which lies in both circles 
is the hypercube edge, the remaining two are the cycle edges. The ability to 
distinguish forward and backward edges on the bntterDy corresponds on the CCC 
to the ability to distinguish consistently left /right when crossing the hypercube 
edge from one cycle (of length n) to another. This can be done by adding id of 
the left neighbour to the messages. The receiver can distinguish left by inspecting 
which circle (of length 8) contains this left neighbour. 

6 Conclusions 

We have presented general, but still Dexible modular method for the design of 
eD cient (linear) leader election algorithms and shown its suitability for spe- 
ciDc topologies (oriented tori and hypercubes, unoriented butterDies and cube 
connected cycles) over the existing approaches pm 

The linear algorithms for unoriented wrapped butterDies, CCC and tori sug- 
gest that on regular graphs of constant degree the global orientation is not 
necessary for optimal leader election. On the other hand, the D (A'’ log TV) lower 
bound on unoriented complete graphs |H| shows that the lack of global orien- 
tation induces additional logiV factor in this case. 0(N log logN) algorithm is 
known for unoriented hypercubes 0, indicating that the additional factor is in 
this case at most O(loglogfV). These results suggest the following problem: 
Prove or disprove 0{N log d) lower bound for leader election on unoriented 
vertex symmetric graphs of degree d. The stronger version is perhaps even more 
interesting: prove/ disprove this lower bound for the weaker problem of reducing 
the number of active vertices to N / (d-f 1). (The warrior technique from P| gives 
0{N log d) upper bound for the last problem.) 
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Abstract. In this paper, we investigate various concepts of leftmost de- 
rivation in grammars controlled by bicoloured digraphs, especially regar- 
ding their descriptive capacity. This approach allows us to unify the pre- 
sentation of known results regarding especially programmed grammars 
and matrix grammars, and to obtain new results concerning grammars 
with regular control, and periodically time- variant grammars. Moreover, 
we get new results on leftmost derivations in conditional grammars. 



1 Introduction 

Although leftmost derivations (mostly leftmost derivations of type 3 explained 
below) played a vital role when several rewriting mechanisms have been defined 
at around 1970, there has been no systematic research in that direction. This 
is surprising: a leftmost restriction may be a means to get rid of the seemingly 
inherent nondeterminism feature in grammars, see |2|, which renders these me- 
chanisms hard to apply. We study leftmost derivations in regulated rewriting 
systematically by using the framework of graph-controlled grammars, which are 
an intuitively appealing unifying framework for many seemingly different ways 
of regulation (see rmm ). Hence, we obtain simplified versions of known and 
many new results on leftmost derivation in regulated rewriting (compare the Sec- 
tion 1.4 in 0). Moreover, such systematic study is useful to detect unexplored 
sub-areas, for example concerning leftmost derivation in time- variant grammars. 
This paper continues our previous works on leftmost derivation, see BCH. 

The need for such a systematic exposition is underlined by in] and ini, 
where ideas from regulated rewriting are applied to parsing theory and database 
theory, respectively, although the papers indicate that the knowledge of basic 
facts in regulated rewriting is not too widespread. Note that in HH programmed 
grammars with left-1 derivations are used without naming them such. 

Conventions: C denotes inclusion, C denotes strict inclusion. 0 denotes the 
empty set. The empty word is denoted by A. The length of a word x is denoted 
by |a:|. We consider two languages Li,L 2 to be equal iff Li \ {A} = T 2 \ {-^}- 

Some knowledge about formal languages is assumed on side of the reader, 
see jbllbj . especially regarding the Chomsky hierarchy £(REG) C £(CF) C 
£(CS) C £(RE). Sometimes, we use the following fact: 

* Supported by Deutsche Forschungsgemeinschaft grant DFG La 618/3-2. 
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Lemma 1. Let C he a trio and let CJ be elosed under union and eontain all 
finite languages. If for all L G C, where L GV* , and for all a € V we know that 
L{a\ lies in C (or, for all a G V we know that {a}L lies in C ), then C Q C . 

2 Definitions and Known Properties 

First, we introduce the concept of graph-controlled rewriting, introduced by 
Wood in izg. This allows us to present programmed, matrix (set) and time- 
variant grammars as well as grammars with regular (set) control as special cases, 
leading to a unified and lucid framework of definitions and arguments as it has 
been done in pum. For historic references, we refer to our quoted papers. 

A grammar controlled by a hicoloured digraph or G grammar is an 8-tuple 
G = {Vn,Vt, P, S, r, E, <P, h) where 

• Vm, Vt, P, S define, as in a phrase structure grammar, the set of nonterminals, 
terminals, context-free core rules, and the start symbol, respectively; 

• F is a bicoloured digraph, i.e., P = {U,E), where f7 is a finite set of nodes 
and E C U X {g,r} X U is a, finite set of directed edges (arcs) coloured by g 
or r (“green” or “red”); 

• ECU are the initial nodes; 

• <P C U are the final nodes; 

• h:U ^ (2^ \ {0}) relates nodes with rule sets. 

There are two different definitions of the appearance checking (ac) mode in the 
literature: We say that {x,u) {y,v) ((x,u) =^c respectively) holds in G 

with (x,u), (y,v) € (Vn U Vt)* x U, if either x = xiax 2 , y = Xif3x2, a ^ (3 G 
h{u), (u,g,v) G E or every (one, respectively) rule of h(u) is not applicable to x, 
y = X, (u,r,v) € E. The reflexive transitive closure of ^ (^ci respectively) is 
denoted by =4> (=4>c, respectively). For m G {c. A}, the languages generated by G 
are defined by Lm{G) = {x G Vf \ 3u G E3v G I^{S, u) {x, r:)} . 

(i) G is said to be with unconditional transfer iff \/u,v G U{{u,g,v) G E 
(m, r, v) G E). 

(ii) If if n 17 X {r} X C/ = 0, G is without appearance checking. 

Depending on the chosen derivation mode m G {c, A}, the language families 
are denoted by £(Gm,CF,ac) (with appearance checking), £(Gm,CF,ut) (with 
unconditional transfer), and £(Gm,GF) (without appearance checking). “GF” 
indicates that we allow context-free rules only. Here and in the following, we 
write GF— A instead of GF if we do not allow erasing core rules of the form 
A — > A. By definition, we have for X G {GF, GF — A}, m G (c. A}: 

£(Gc, X) = £(G, X) C £(G„^, A, ac), and £(G^, A, ut) C £(G„, A, ac). 

We consider six special cases of G grammars in the following: 

• A programmed grammar or P grammar has the following features: 
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o Every node contains exactly one rule. (Therefore, both modes of ac coincide.) 
o All nodes are both initial and final. 

• A grammar with regular set control or rSC grammar is a G grammar obeying: 

o If there is a red arc from node u to node v, then there is also a green arc 
from node u to node v. 

• A grammar with regular control or rC grammar is an rSC grammar such that 
o every node contains exactly one rule. (Therefore, both modes of ac coincide.) 

• A matrix set grammar or MS grammar is an rSC grammar when: 

o Only the initial nodes (not necessarily containing rules with left-hand side S) 
are allowed to have more than one in-going green arc, while only the final 
nodes are allowed to have more than one out-going green arc. Between every 
final node and every initial node, there is a green arc. 

• A matrix grammar or M grammar is both an MS and an rC grammar. 

• A (periodically) time-variant grammar or TV grammar has the following fea- 
tures: 

o If there is a red arc from node u to node v, then there is also a green arc 
from node u to node v. Every node has exactly one in-going green arc and 
one out-going green arc. In other words, the graph of green arcs has a simple 
ring structure. 

o There is one designated initial node, and every node can be a final node. 

Here and in the following, we leave out unnessary components when defining 
special cases of G grammars, e.g., initial and final nodes need not be specified 
for P grammars. As language families, we obtain £(A, Y,Z), where X S {P, 
rSG™, rG, MS„, M, TV^}, Y e {GF, GF - X}, Z e {ac, ut. A}, m S (c, A}. 

Obviously, every M[S] grammar is also an r[S]G grammar, and it is not hard 
to see that every TV grammar is also an rSG grammar. Besides these trivial 
relations, the following results are known in the area of free derivations |til 1 1 )j : 

Theorem 2. Let X € {F,M,rCj, V € {CF, CF- X}, Z S {ac,ut,X}, Z' e 
{ac. A). Then, one finds 

1. C{X, Y, Z) = C{Gc, Y, Z) = C{TV„ Y, Z) = C{rSC,, Y, Z) = C{MSc, Y, Z); 

2. L{X,Y,Z') = C{G,Y,Z') = C{TV,Y,Z') = C{rSC,Y,Z') = C{MS,Y,Z'); 

3. C{G, Y, ut) = £( TV, Y, ut) = £( rSG, Y, ut) = C{MS, Y, ut) = C{M, Y, ac) . 

3 Two Further Derivation Modes 

Motivated by the left-2 derivation (as defined in the literature, see discussion 
in Section 0, it is possible to define a third and fourth derivation mode be- 
sides => and =>c for graph-controlled grammars. Let us call these modes the 
set modes, abbreviated as and =^S 2 . They are defined as follows. Let G = 
(Vat, Vt, P, S, F, E,T>, h) with F = {U, £) be a G grammar. Let {x, V), {x' , V) G 
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(Vat U Vt)* X 2^. We say that {x,V) {x',V') {{x,V) ^S 2 (x',V'), respec- 

tively) holds in G, if either x = Xiax 2 , x' = Xif3x2, a ^ /3 € h{v), for some 
V G V, where V = {v' \ {v,g,v') € E}, or no rule of [Jy^y h{v) is applica- 
ble to X, x' = X, V = {v' I {v,r,v') e E} for some w G F in case of =i>si 
and V = {v' I G V{v,r,v') G E} in case of The reflexive transitive 
closure of i^s 2 j respectively) is denoted by respectively). The 

corresponding languages generated by G are deflned by La{G) = {x € V^\ 
3V G 2^ ,V n ^ yf 0 and {S,E) {x,V)} for s G {si,S 2 }. As with the =i>c 

mode, we indicate the use of one of the set modes by subscribing. 

Remark 3. 1. Lm{G) = L{G) for every m G {c, si,S 2 } and every G grammar 

G without ac, since it does not matter whether we choose first the node and 
then the rule to be applied or whether we choose a rule from the whole set. 

2. Lm{G) = L{G) for every m G {si, S 2 } and every TV grammar G, since there 
is always only at most one possible continuation node. 

3. Lyy{G) = L{G) for every m G {c, Si,S 2 } and every RC grammar G by 
combining the arguments of the first two points: inside the matrices of the 
MS grammar, there is (at most) one possible continuation (ii), and between 
(different) matrices, there are only green arcs (i). 



Lemma 4. For Y G {CF, CF — A}, Z G {ac,ut}, s G {si,S 2 }, we can show: 
C{G,Y,Z)CC{Ga,Y,Z). 

For reasons of space, we omit the nearly purely structural construction of the 
proof of the lemma; similar ideas also work for P, M[S], and r[S]C grammars. 

Lemma 5. For Y G {CF, CF — A}, s G {si, S 2 }, C{Cs,Y, ac) C £(G, Y, ac). 

Proof. Let G = {VN,VT,P,S,F,E,<l>,h) with F = (U,E) be a G grammar 
with ac. We construct, for s G {si, S 2 }, a G grammar Gg = (Vat U {A}, Vr, P U 
{A — > F I A G Vn}, S, F' , {{u, E) \ u G E},{{u,(P) \ u G <P},h') with ac; let 
h'{{u,V)) = h{u) U {A ^ F I 3A ^ w £ Uu'ev Fg = (Ug,Eg) such 

that Ug = U X 2^, where Eg is given by 

• {{u,V),g,{u',V')) G Eg iff {u,g,u') G E and V' = [u\ {u,g,u) G F}; 

• {{u, V),r, {u! , V')) G Fgj iff {u, r, u') G F and V = {u\ {u, r, u) G F}; 

• ((u, V),r, {u' , V')) G Fs 2 iff {u, r, u') G F for some u GV and 

V = {u\ {ii, r,il) G E,u G V}. Q 

It is clear that the previous construction can be changed by serialization of 
the test rules A ^ F in order to work for programmed grammars as well. For 
matrix (set) grammars, one has to do a similar trick only at the “merging points” 
between the matrices. Observe that the construction above does not work in case 
of unconditional transfer. This is no coincidence, as the following lemma tells us. 



Lemma 6. For Y G {CF,CF-X}, sG { 51 , 53 }, C{Mg,Y,ut) =£{M,Y,ac). 
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Proof. The previous lemma and the subsequent remarks show the direction C. 
For the other direction, consider a P grammar G = {V^, Vr, P,S,r = {U, E), h), 
Vm = {^ 1 , • ■ • , An}, U = {ui, . . . , Uk\, with ac. We have the following matrices in 
a simulating M grammar G' = (V^, Vt, P' , S', P' , h') with = VnUUUIS' , F} 
(disjoint union!): 

• start matrices (5" ^ uS) for all it G U; 

• termination matrices (it — *■ A, An —>■ F) for every u G U. 

• For every it, it' G U with (ui,g,u') G E we take matrix (A —>■ w,u\ 
F,..., Ui-i F, Ui+i ^ F, . . .Uk ^ F,Ui ^ It') for h{ui) = {A ^ in}. 

• For every u,u' G U with (it, r, it') G E we define a matrix (it — > it', A — > F) for 

h(it) = ^ w}. 

So, P', P' and h' are implicitly defined. Observe that to a sentential form con- 
taining at least one nonterminal, one of the (termination or start) matrices is 
always applicable, so that the first component of each matrix is never applied in 
ac manner. In the non-erasing case, one can apply lemma E ^ 

We defer a summary of the obtained results to Section oi since all our 
reasonings transfer to that case, too. 

4 Leftmost Derivations 

First, let us discuss what we could mean with the term “leftmost derivation” in 
the context of graph-controlled grammars and their specializations. Observe that 
all these definitions coincide in the case of unregulated context-free grammars. 

4.1 Leftmost Derivations of Type 1 

This mechanism is the strongest form of leftmost interpretation p. 54]: At 
each step of a derivation, the leftmost occurrence of a nonterminal is rewritten. 

Observe how the notion of applicability of a rule is affected by this definition: 
A rule is not applicable if its left-hand side does not equal the leftmost nontermi- 
nal symbol in the current sentential form. If a (set of) rule(s) is not applicable, 
the derivation proceeds according to one of the four modes introduced above. 
(This has been left a bit unclear in earlier works [511 4] in our opinion.) 

Theorem 7. Let X G {G, TV, P, MfS], rfSJC}, m G {A,c,si,S 2 }, Y G {CF, 
CF-X}, Z G {ac, ut, A}. Then, C{Xm, Y, Z, left-1) = C{CF). 

Proof. It is clear that each context-free language can be be obtained by any of 
the regulation mechanisms previously introduced when working under leftmost 
derivations of type 1. So, we only have to show that graph-controlled grammars 
working under leftmost derivations of type 1 are not more powerful than context- 
free grammars. We give a direct proof in the following 0 

^ Observe that the corresponding textbook proofs pi Lemmas 1.4.1&2] are indirect 
since they rely on non-trivial properties of type-0 grammars nnder leftmost deriva- 
tion, an area which has been investigated by several authors around 1970, see the 
discussion following U Corollary 7]. 
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Let G = (V/v, Vt, P, S, P, S, h) with P = {U, i?) be a G grammar. We con- 
struct a context-free grammar G'^ = P^, Sq) (in a triple-construction 

manner) with = U x x U Li {5o}, and consists of the following rules 
(only in one place we have to distinguish the possible derivation modes =^rn with 
m G {A, c} of G): 

1. Po (wj S, u'), if It G P, m' G <P. 

Now, let A ^ w G h{u). 

2. Let (m, r, u”) G E and assume, in case of the =i>-mode, that A is not left-hand 

side of any rule in u; then take (u. A, u') — > {u" , A, u') into P'; or assume, in 
case of the =l>c-inode, that A is not left-hand side of one rule in u; then take 
{u,A,u') {u" ,A,u') into P^. 

3. If w G tfj, then put (rt, A,u) ^ w into P^. 

4. Assume further that w = X 0 B 1 X 1 B 2 ■ ■ ■ Xr-iBrXr with Xi G and Bi G Vn- 
Put (uo,A,Ur) Xo(uo, Bi,Ui)xi(ui, B2,U2) ■ ■ • Xr-l(Ur-l, Br,Ur)Xr iutO 
Pi^, where all Ui G U and (uo,g,Ur) G E. 

It is easy to see that the language L generated by G working under leftmost de- 
rivations of type 1 in modes => or =^c, respectively, equals the language obtained 
via G' or G(,, respectively. 

For the set modes, we can apply the construction given in lemma 0 

In the literature, the only cases of the preceding theorem that have been 
treated are M and P grammars with the derivation mode. 

4.2 Leftmost Derivations of Type 2 

Again, we quote the general definition from |^: At each step of a derivation, 
the leftmost occurrence of a nonterminal which can be rewritten (note that in 
regulated grammars only certain nonterminal occurrences can he rewritten in a 
given stage of derivation) is rewritten. Grammar derivations falling into this 
category have been investigated in 

This general definition leaves lots of room for interpretation. 

Our general idea of leftmost derivations of type 2 will be to define the set of 
“nonterminals which can be rewritten” according to our four derivation modes 
separately. This will also allow us to treat ac according to the definitions given 
above, an issue which has not been tackled in this setting before. 

The derivation mode The current state is a pair {x, u), where u is some 
node and x a sentential form. Then, we choose a rule A ^ w G h(u). So, there is 
just one “nonterminal which can be rewritten”, namely A. (Therefore, we leave 
the discussion of this mode to Section ^31) If A is contained in x, we replace A’s 
leftmost occurrence by w and proceed via one of the green arcs leaving node u. 
Otherwise, we can apply A ^ re in ac mode, so that we leave node u by one of 
its outgoing red arcs. 

The derivation mode =^: Gonsider again a current state {x,u). Now, the 

set N of “nonterminals which can be rewritten” is the set of left-hand sides 
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of rules in h{u). If some symbol from N occurs in x, we look for the leftmost 
symbol A G N occurring in the form x. We apply some rule A w G h(u) 
to this leftmost occurrence of A. Then, we proceed via one of the green arcs 
leaving node u. If no symbol from N is contained in x, leave node u by one of 
its outgoing red arcs. 

The derivation modes and =J>S2- Now, the current state is a pair {x, V), 
where V is some node set and x a sentential form, and the set N of “nonterminals 
which can be rewritten” is the set of left-hand sides of rules in Umgv h{u). If some 
symbol from N occurs in x, we look for the leftmost symbol A G N occurring in 
the form x. We apply some rule A ^ w G h{u) (for some u G V) to this leftmost 
occurrence of A. Then, we proceed via one of the green arcs leaving node u. If 
no symbol from N is contained in x, the set of next nodes is defined via the 
outgoing red arcs of one (all, respectively) nodes in V. 

In case of programmed grammars without ac, this coincides with the usual 
meaning of “leftmost derivation of type 2” . This motivated the introduction of 
the set derivation modes. The following is known from the literature [3, Theorem 
1.4.3] and Theorem 7] regarding programmed grammars: 

Theorem 8. Let s G {si,S 2 }, and let Z G {ut, ac, A}. Then, we know: 

(1 ) C{Ps, CF - A, Z, left-2) = C{CS); (2) C{Ps, CF, Z, left-2) = C{RE) . 



Lemma 9. Let m G {A, c, si, S 2 }, and Z G {ut, ae, A}. Then, we obviously have: 
(1 ) C{Gm,CF- A, Z, left-2) C £( CS); (2) C{Gm, GF, Z, left-2) C C{RE) . 

Observe that this definition of left-2 derivations in matrix grammars (in any 
derivation mode) deviates from the classical one given in |S| inspired by Salomaa 
j fn^ . The “set modes” correspond to the wl-mode introduced in p[j . Since matrix 
grammars are matrix set as well as regularly controlled (set) grammars, we get 
results analogous to the previous theorem for those language classes as well. For 
the proof in case of unconditional transfer, ideas from the proofs of m Theorem 
7] and Lemma 4.3] can be adapted. So, we are left with TV grammars and 
with the =^-mode of derivation in order to complete the picture. 

Theorem 10. Let Z G {ut, ac, A}, m G {A, si,S 2 }- kFe can prove: 

(1) C{TVm, GF - X,Z, left-2) = £(C5); (2) C{TVm, GF,Z, left-2) = C{RE). 

Proof. By the previous lemma, we need to worry about the inclusions V. Fur- 
ther observe that si and S 2 mode coincide for TV grammars. When ac is not 
involved, we show how to simulate a programmed grammar by a time variant one 
in the following^ Let G = (V/v, Vr, P, S, F, h) with F = (U, E) be a Psi gram- 
mar without ac. We construct a TV grammar G' = {Vfq,VT,P' ,S' ,F' ,E,h') 
without ac simulating the left-2 derivation within G. (First, we give a construc- 
tion admitting erasing rules.) Let F' = {U' = [vi, , ^ 2 ^+ 2 }, E') possess a ring 
structure, i.e., (vi,g,Vj) G E' iS j = i mod(2r -1-2) -|- 1. Let E = {c 2 t--i- 2 }, 
V)(r = Vat U Vat X [/ U {[n, j] \ uGU,l<j < 2} U {5'}. The rules of G' are: 



^ The construction of 0 Lemma 2.1.1] is not applicable here. 
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• h'{vi) = {[it, 1] ^ [u, 1] \ Ui} U (A — > {A, It) I (iti, g, u) G E} for 1 < i < r; 

• E{vr+i) = {[it, 1] ^ [u',2] I {u,g,u') G E}\ 

• h'{vr+i+i) = |[m, 2] ^ [u, 2] I u ^ Ui} U {{A, Ui) ^ w \ h{ui) = (A ^ w}} for 
1 < i < r; 

• h'{v 2 r+ 2 ) = |[m, 2] ^ [it, 1], [u,2]^ X\ueU}U 

jiS" — > [it, Ijiti I h{u) = {S' — > If}}. 

In the non-erasing case, observe that the corresponding P grammars characte- 
rize the context-sensitive languages which are a trio. Moreover, the time-variant 
grammars obviously generate a language class which is closed under union. So, we 
can apply lemma ^ changing the newly introduced erasing rules into [u, 2] ^ a 
for some terminal a. Details of the case of unconditional transfer are omitted. 

The picture of the leftmost-2 world is completed in the next theorem. 

Theorem 11. Let X G {G, TV, MS, rSC}, X' G {P, M, rC}, m G {A,Si,S 2 }, 
s G {si,S 2 }, Y G {GF, GF-X}, Z G {ac, iti. A}. Then, 

1. C{X^, GF- A, Z, left-2) = £(X', GF- A, Z, left-2) = C{GS); 

2. C{Xm, GF, Z, left-2) = GF, Z, left-2) = C{RE). 

Proof idea. By what we have shown up to this point, we have to treat the 
MS case in combination with the ^-mode. It is not too difficult to show the 
inclusion £(?«, Y, left-2) C £(MS, Y, left-2) for Y G {CF, CF-A}. □ 



4.3 Leftmost Derivations of Type 3 

Again, we start with the definition from jS]: to use each rule in a leftmost manner, 

i.e., the leftmost appearance of its left-hand member is rewritten. First, we link 
left-3 derivations with the left-2 derivation cases which are not classified yet. 

Lemma 12. Let A G {G, TV, MS, rSG, P, M, rG}, Y £ {GF, GF-X}, Z G 
{A, ac, ut}. Then, we have C{Xc, Y, Z, left-2) = C{Xc, Y, Z, left-3). 

The next theorem links the different left-3 classes. Proofs are always the same 
as in case of free derivations. Especially, we can refer to the fourth and fifth 
section of [H3 regarding modes => and =^c and the preceding section regarding 
the set modes. Many open questions in this respect are contained in El- 

Theorem 13. LetY G {GF, GF— A}, Z G {A, ac}, andmG {A, c, Si,S 2 }. Then: 

1. C{P, Y, Z, left-3) = C{Xm, Y, ac, left-3) for X £ { G, M, MS, rG, rSG, TV, P}. 

2. C{P, Y, ac, left-3) = C{Xm, Y, ut, left-3) = C{X{, Y, ut, left-3) for m ^ c, s £ 
{si, S 2 }, A G {G, MS, rSG, TV}, X' £ [rG, M, P}. 

3. C{P, Y, ut, left-3) = F(Ac, Y, ut, left-3) = T{X'^, Y, ut, left-3) for m £ {A, c}, 

A G { G, MS, rSG, TV}, X' £ [rG, M,P}. □ 
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Finally, observe the followi^ relations we have shown in solving some 
open problems contained in 

Theorem 14. £(P, CF[-X])CCiP, CF[-X], left-3) C £(P, £F[-A], ac, left-3). □ 

5 Beyond the Framework: Conditional Grammars 

For conditional grammars (K grammars) we refer to |n|- We restrict ourselves 
mainly to regular conditions. Leftmost derivations have not been considered for 
these grammars in the literature. It is clear what a K grammar with leftmost 
derivations of type 1 and 3 should be. As regards left-2 derivations, note that 
those rules {a — *■ f3, Q) with u G Q are applicable to m G {Vm U Vt)*- 

Lemma 15. IfY g{CF-X, CF}, x G {2,3}, then C{K,Y) C £(A,F, left-x). 

Proof idea. In principle, a variant the well-known “colouring trick” will work 
for the simulation of free derivations by leftmost derivations. Q 

Theorem 16. Let Y G {CF — A, CF\. Then, we find C{Y) = C{K,Y, left-1). 

For reasons of space, we omit the rather tricky construction of this proof. We 
found a direct simulation as in Theorem m In conclusion, also K grammars are 
a means for describing levels of the Chomsky hierarchy in a context-free style. 

Corollary 17. Let x G {2, 3}. We can show the following: C{K, CF)— A], left-t) = 
C{CF) c £(A, CF-X[, left-x]) = C{CS) C C{K, CF[, left-x]) = C{RE). □ 

Are there any context-free style derivation mechanisms whose left-1 inter- 
pretation yields non-context-free languages? We consider conditional grammars 
with non-regular context conditions: For example, {a"* 6"^ a™ | m > 1} is gene- 
rated by G = ({S', T, T'l, {a, b}, P, S) with rules 

(S ^ aS, {a, b}*S), (S ^ bS, {a, b}*S), (S ^ T, {a, b}*S), 

(T ^ r, [a^b"^ I m > Ija*), (T' ^ A, a*{5™a™ | m > 1}). 

Paun and Urbanek proved that the family of languages generable by K 

grammars with regular core rules and context-free rule conditions strictly con- 
tains the intersection closure of the context-free languages. Clearly, that family 
is contained in the family of languages generable by K grammars with linear core 
rules and context-free rule conditions, which is included in the family of langua- 
ges generable by K grammars with context-free core rules and context-free rule 
conditions working under left-1 interpretation. Is any of those inclusions is strict 
or not? 

^ When admitting erasing rules, those results can be alternatively derived combining 
results from 15181121201 . More precisely, Virkkunen showed in EDI Lemma 5] how 
to characterize scattered context languages as morphic images of intersection of 
unordered scattered context languages with left-3 derivations and regular sets. 
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Abstract. Formal methods based on the mathematical theory of par- 
tially ordered sets (i.e., posets) have been used in the database field for 
the modelling of spatial data since many years. In particular, the use of 
the lattice completion (or normal completion) of a poset has been shown 
by Kainz, Egenhofer and Greasley to be a fundamental technique 
to build meaningful representations of spatial subdivisions. In fact, they 
proved that the new elements introduced by the normal completion pro- 
cess can (and have to) be interpreted as being the intersection of poset 
elements. This is fundamental, from a mathematical point of view, since 
it means that the lattice resulting from the normal completion is the clo- 
sure of the given poset with respect to the intersection operation. In this 
paper we precisely clarify the limitations for the use of lattices as models 
for spatial subdivisions, by proving sufficient and necessary conditions. 
Our result gives therefore a sound theoretical basis for the use of lattices 
built on simplicial complexes as a data model for spatial databases. 



1 Introduction 

A class of sets together with a set-containment relation among them models 
many common situations in spatial databases. For example it may represent a 
containment relation between geographical objects of the plane or a hierarchical 
relation between administrative units. The set-containment relation is a partial 
order relation. Formal methods based on the mathematical theory of partially 
ordered sets (i.e. posets) have been used for the description of spatial relations 
since many years cnm. 

In particular, the use of the lattice completion (or normal completion) of 
a poset has been shown by Kainz, Egenhofer and Greasley m to be a fun- 
damental technique to build meaningful representation of spatial subdivisions. 
They proposed to represent by means of the elements introduced by the normal 
completion operator, the set-intersection between sets of the class. Consider for 
example the class of sets S containing the four sets A, B, C, and D shown in 
Fig.n 

Each set is represented with an elliptic shape filled with a different pattern. 
Zones filled with more than one pattern belong to more than one set. 

We can represent the class S with the poset P shown in Fig. |2| left. 
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Fig. 1. A class of sets with a set containment relation 




Fig. 2. (left) A poset representation of the class of sets of Fig. ^ (right) The normal 
completion of the poset in Fig. |21 (left) 



Now suppose we want a representation of the closure S' of the class S with 
respect to the set intersection operator (i.e. the class obtained intersecting each 
possible pair of sets taken from S). Such a closure is composed by the sets 
contained in S plus the set ADB and the empty set. The set AC]B is contained 
in the sets A and B and it contains the sets C and D. The normal completion 
of poset P is the lattice L, shown in Fig. 0 right. The lattice L is composed by 
the elements of P plus a top and bottom element, and a new element labeled E, 
which is smaller than the elements (representing the sets) A and B and greater 
than those (representing the sets) C and D. Therefore, since the relation of set 
with respect to other sets of S is analogous to that of the element E with respect 
to other elements of L, the lattice L can represent the class of sets S', provided 
that the element E represents the set AO B. 

In the general case, however, using the normal completion operator to repre- 
sent the set-intersection operator, may lead to incorrect results, as the following 
example shows. In Fig.0 a class S of sets with a set containment relation and 
its closure S' with respect to the set intersection operator are represented. 
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Fig. 3. A class of sets with a set containment relation 



The class S is composed by the sets labeled as A, B, C, D, E, D and T, where 
T = UxgS ® greatest set of the class and it is not represented in Fig. 0 As 

Fig. 0 shows, the class S' is composed by the sets contained in S plus the sets 
AnB, BnC and Af]C = Af]Bf]C. A poset P representing the class S in shown 
in Fig. Elleft. If we build the normal completion of P, we obtain the lattice in 
Fig. El right. The newly created element X is the greatest lower bound of the 
elements A and B, hence it shoud represent the set AnB. However X is also the 
greatest lower bound of the elements B and C, hence it should represent the set 
BnC, but as Fig. 0 shows, AnB and BnC are different sets and consequently 
is incorrect to represent them by the same poset element. 




Fig. 4. (left) A poset representation of the class of sets of Fig. El (right) The normal 
completion of the poset in Fig. El (left) 



In Fig. 0 we see a correct representation of S' . We have built poset in Fig. 
El starting from Fig. 0 (that shows the class S) and not from Fig. El left (the 
poset representation of S'). In fact the poset in Fig. El left does not provide 
enough information: for example inspecting Fig.0we see that the sets AnC and 
An BnC are the same set, but poset in Fig. Elleft cannot carry this information. 
If A n H n C was strictly contained in A n C, see example in Fig. 0 then the 
poset in Fig. 0 left would still be, without any modification, a representation of 
this different class. 

This fact shows that to represent set intersection operator by means of poset 
operator we have to provide more information to our representation. A way to do 
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Fig. 5. A representation of the closure with respect to set intersection operator of the 
class of sets in Fig.EI 




AnBriC 



Fig. 6. A class of sets with a set containment relation 



this is to include in the class S a spatial subdivision of the whole domain on which 
S is defined. In a poset that represents such a class there is an element for each 
of the atomic units of the spatial subdivision of S. We show that when a class S 
includes a spatial subdivision, the normal completion of its poset representation 
is a correct representation of S' . This was the case discussed by Kainz, Egenhofer 
and Greasley m since they modeled spatial regions by means of simplicial 
complexes, that include naturally a spatial subdivision. 

We highlight in this paper the fundamental role played by the presence of 
a spatial subdivision for a correct use of the normal completion operator. We 
give necessary and sufficient conditions for a correct use of lattices as models for 
spatial relations. 

The use of posets as a modelling structure for realities in spatial databases 
is largely widespread Also, a discrete basis for the sets of the 

class, which is analogous to the universal partition we introduce in Sect. 2, is 
commonly used in the modelling of geometrical entities [fril l l)j . Such a discrete 
basis is indeed the starting point for many efficient data structure based on a 
space-partitioning criteria, e.g. quadtree grid-file El, k-d tree [3|, cell-tree 
0. Normal completion plays a central role in posets operations uni Various stu- 
dies has been conducted to develop efficient algorithms for its construction. The 
most interesting, in our opinion, are Efficient representation techni- 
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ques for posets have been developed in ma. With reference to the use of posets 
to model spatial databases, in an incremental algorithm to build the normal 
completion of a poset is given. But the issue of how to interpret the new elements 
inserted for the completion with respect to the reality of interest is left open. 

We close this section with a brief summary of the rest of this paper. In Sect. 
2 we introduce formally the definitions of closure of a class of sets with a set- 
containment relation with respect to set-intersection, of representation of a class 
of sets and of universal partition. Section 3 is dedicated to the study of the 
representation of the closure of a class with respect to set-intersection. 

2 Representations and Closnres 

In this section we define formally what we mean by closure of a class of sets with 
respect to a certain set operator, and what we mean by representation of a class 
of sets with a set-containment relation by means of a poset. We also introduce in 
this section the concept of universal partition of a class S with a set-containment 
relation. It will be used in later sections as a tool to operate efficiently on sets 
belonging to S and on sets belonging to closures of S. 

We consider only finite classes, i.e. classes containing a finite number of 
sets. For technical reasons it is useful to work with classes of sets with a set- 
containment relation that contain a greatest set (namely a set that contains 
every other set of the class) and a least set (namely a set that is contained in 
every other set of the class) . This is not a restriction since if a finite class of sets 
has not a greatest or a least set, we can always extend it adding respectively 
the set union of all the sets of the class or the empty set, and then work with 
the extended class. From now on, when we speak of a class of sets with a set- 
containment relation, we always refer to the extended class. All results proved 
in this section are almost straightforward, hence proofs are omitted. 

Definition 1. Let S be a elass of sets with a set-eontainment relation. We define 
, the closure of S with respect to set-intersection operator, by the following 
rules: 

1. if s G S then s G ; 

2. Vsi, S2 G Si D S2 G . 

To build correctly the closure of S we need to perform aggregations and 
subdivisions of sets. For this aim we make use of a universal partition, a subclass 
of S containing sets that act as building blocks for every other set of S (i.e. every 
set of S can be obtained applying the set-union operator to a suitable collection 
of sets of the universal partition) . 

Definition 2. Let S be a class of sets with a set- containment relation, and let 
Us Q S. We say that Us is a universal partition of S *fVri,r 2 G Us, we have 
ri n r 2 = 0, and \/s G S there exist ri, r 2 . . . r„ G Us such that s = |Jj r^. 

To associate to each set of the class its building blocks (i.e. the collection of 
sets of the universal partition that compose the set) we define a mapping. 
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Definition 3. Let S be a elass of sets with a set-eontainment relation and a 
universal partition Us- We define the mapping iSsase : S 2^® as 

<S'Base(s) = {r G U s\r c a} . 

The following proposition shows that for each set s of a class of sets with a 
set-containment relation, there exists a unique collection of sets of the universal 
partition whose set-union is equal to s, and that this collection is exactly 5'Base(s). 

Proposition 1. If S is a class of sets with a set-eontainment relation and a 
universal partition Us, there exists a unique set {ri, C 2 . . . r^} G 2^® such that 
S = U, n- Also Vs G S', S = Ur-GSB™(s) 

The universal partition Us of a class S of sets with a set-containment relation 
is also a universal partition of S^. 

Corollary 1. Let S be a class of sets with a set-containment relation and a 
universal partition Us ■ Then Us is a universal partition of . 

Thanks to the corollary above, we can apply Definition 0 also to S'^. 

Now we define formally what is a representation by means of a poset of a 
class of sets with a set-containment relation. 

Definition 4. Let S be a class of sets with a set- containment relation and let 
< P,<> be a poset. We say that P is a representation of S if there exists an 
isomorphism between S and P. 

In the rest of this paper, every time we deal with a representation P of a class 
S of sets with a set-containment relation, we refer the isomorphism between S 
and P as Rep : S P. Oi course there exists Rep~^ : P S. Note that since 
the classes of sets with a set-containment relation we consider have a greatest 
and a least set, their representations have a greatest and a least element. 

In a representation of a class of sets with a set-containment relation and 
a universal partition, we need to identify the representants of the sets of the 
universal partition. 

Definition 5. Let S be a class of sets with a set- containment relation and a 
universal partition Us, and let P be a representation of S. We define universal 
partition on P the set Up = {x G P\x = Rep{r) and r G Us}- 

Also in the representation we need to refer to representants of the sets of 
the universal partition whose set-union is a given set. Therefore we introduce 
the mapping Pease (■) from elements in P to subsets of the universal partition 
defined on P. 

Definition 6. Let S be a class of sets with a set- containment relation and a 
universal partition Us, and let P be a representation of S. For each p G P, we 
define the mapping PBase : P 2*^^ as 

PBaseip) = {x G P\x = Rep{r) and r G SBase{Rep~^ (p))} ■ 
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In a class of sets with a set-containment relation and a universal partition, 
a set is ’composed’ by sets of the universal partition by means of the set-union 
operator. In the representation of the class an analogous ’composition’ is obtai- 
ned by means of the lub{.) operator that assign to each subset of a poset its least 
upper bound, as the following theorem shows. 

Theorem 1. Let S be a class of sets with a set- containment relation and a 
universal partition. If P is a representation of S then for each s G S we have: 

Rep{s) = lub{{y\y = Rep{r) and r G S'Base(s)}) • 

If one thinks to PBasei-) as a mapping between the posets < P,<> and 
< 2^^,C>, previous proposition translates into the following corollary: 

Corollary 2. The mapping PBase(-) is an order embedding from the poset < 
P, <> to the poset < 2^^ , C>. 

3 Representation of Set-Intersection Closnre 

3.1 Introduction 

In this section, given a class S of sets with a set-containment relation and its 
representation P, we study how to derive from P a representation of 5''^, the 
closure of S with respect to the set-intersection operator. Before we proceed with 
formal investigations on this subject, let us see how the existence of a universal 
partition modifies the example presented in Fig.|3 In Fig. 0we show a class S of 
sets containing five sets A, B, C, D, E which have exactly the same containment 
relations as the regions in Fig. [2 But the class also contains a universal partition, 
whose elements coincide with the unit squares of the grid. Sets A, B ,C, D, E are 
shown as aggregations of unit squares identified by different patterns. 

A poset representation P for this class of sets is shown in Fig. 12 (the top and 
the bottom of the poset have been omitted for clarity) . We want to construct a 
representation of S^, namely a representation which contains also elements that 
represent sets Af] B, B DC and An B DC. 

Comparing P with the poset in Fig. 0 left we can see that the universal 
partition provides informations on the class S that were missing in the poset 
in Fig. El left. For example elements Id, 2d and 3d represent regions contained 
in both sets A and B but not in set C. This fact means that An B and A n 
B nC are different sets. Figure 0 shows the normal completion M{P) of poset 
P (in Fig. 0 also, the top and the bottom of the lattice have been omitted for 
clarity). Inspecting Fig. 0 (and recalling Fig. 0) we can see that M(P) is a 
correct representation of class since elements labeled X, Y and Z represent 
respectively sets AnB, B nC and An BnC. This is a general fact, as we show 
formally in the following subsection. 
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Fig. 7. A class of sets with a set containment relation 




Fig. 8. A poset representation of the class of sets of Fig.[7| 



3.2 Sufficient Conditions for Representation of Set-Intersection 
Closure 

Proofs of results in this section have been omitted since they are either almost 
straightforward or rather technical. They can be found in the extended version 
HZ). The following theorem tells us that given a representation with a universal 
partition, the greatest lower bound of the representants of two sets represents, 
if exists, the intersection between the two sets. 

Theorem 2. Let S be a elass of sets with a set-eontainment relation and let P be 
its representation. Assume P has a universal partition Up. For every X\,X 2 G P, 




Fig. 9. The normal completion of the poset in Fig. 0 
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if there exists Xo = glb{x\,X 2 ), then 

Rep~^{x\) n Rep~^{x 2 ) = Rep~^{xo) ■ 

Previous theorem suggests that given a class S of sets with a set-containment 
relation, in order to provide a representation for the intersection of every subclass 
of S (i.e. to provide a representation for we need to extend the represen- 
tation of 5 to a poset that has a gib for every subset of its elements, namely 
a lattice. Since the MacNeille completion of a poset to a lattice is the most 
common way to realize such an extension (and indeed the resulting lattice has 
interesting properties) we investigate the possibility of representing by me- 
ans of M{P), the MacNeille completion of P. We prove in the following that if 
a universal partition of S exists, M{P) is a representation of Afterwards we 
discuss what happens if a universal partition does not exist. 

In Theorem 0 we will build an isomorphism between the closure of the class 
S with respect to the set-intersection operation and the normal completion of 
its representation. 

Theorem 3. Let S be a class of sets with a set- containment relation, a universal 
partition Us, and a representation P. The mapping I Rep : S’~^ i— *■ M{P) defined 
as 

IRep{s) = {{g G P\g = Rep{r),r G S'Base}*)» 
is an isomorphism. Hence M{P) is a representation of S^. 

The result of Theorem 0 in the restricted formulation for simplicial com- 
plexes, where a universal partition always exists, was proved by Kainz, Egenho- 
fer and Greasley m- An obvious consequence of Theorem 0 is that Vsi,S 2 G 
S,IRep{s\ n S 2 ) = glb{I Rep{s\) , I Rep{s 2 )) , namely the representant of the in- 
tersection of two sets is the gib of the representants of the sets, as we conjectured 
in Sect. 1. 

3.3 Necessary Conditions for Representation of Set-Intersection 
Closure 

Theorem El tells us that given a class S of sets with a set-containment relation 
and its representation P, the existence of a universal partition is a sufficient 
condition for the isomorphism between the posets < S"^, C> and < M{P), <>. 
Such a condition is not necessary, however, as the example presented in Figs. 
El and El shows. In fact in that example, even though there is not a universal 
partition, we can build the isomorphism by representing the intersection of the 
sets A and B with the new element (E) introduced in the poset by the MacNeille 
completion. To find a necessary condition for the isomorphism between the posets 
< and < M{P),<>, we can proceed in two ways. Either we have to 

carry out further investigations about the links between the posets < S’'^,C> 
and < M{P),<> or we have to find additional conditions for the class S. We 
now investigate both alternatives. The following definition introduce a mapping 
Z : M{P) I— > which we use to show further results for the first alternative. 
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Definition 7. Let S be a class of sets with a set-containment relation and let 
P he a representation of S. We define the mapping Z : M{P) i— > 5''^ as 

Z{x) = Pi Rep-'^{g}-^{y)) . 

The following lemma shows that the mapping Z{.) is an order embedding. 

Lemma 1. The mapping Z : M{P) i— > is an order embedding between the 

posets < S’^,C> and < M{P), <>. 

From previous lemma an important result follows immediately. 

Lemma 2. Let S be a class of sets with a set- containment relation and a repre- 
sentation P. We have \M{P)\ < where M{P) is the MacNeille Completion 
ofP. 

Given the above lemma, a way to find a necessary condition for the existence 
of an isomorphism between the posets < 5"^, C> and < M{P), <> is to find a 
necessary condition for the sets and M{P) to have the same cardinality. We 
achieve this result by means of the mapping Z(.). The following theorem states 
a necessary condition for the isomorphism between the posets < 5'^,C> and 
< M{P),<>. 

Theorem 4. Let S be a class of sets with a set- containment relation and a 
representation P. If is isomorphic to M{P), then Vso, si, S 2 G S, if Rep{so) = 
glbp{Rep{si), Rep{s 2 )) then si n S2 = So- 

Theorem 0 gives a necessary condition for the isomorphism between the 
posets < S'^,C> and < M{P),<>, namely the fact that Vso, si,S 2 G S, if 
Rep(so) = glbp{Rep{si), Rep{s2)) then si n S 2 = So- Note that this condition is 
not sufficent, as the example of Fig. 0 discussed in Sect. 1 shows. Inspecting Figs. 
0and01eft we see that Vso, si,S2 G S', if Rep{so) = glbp{Rep{si),Rep{s2)) then 
Si n S2 = So- However posets < S^,C> and < M{P),<> are not isomorphic 
since sets S'^ and M{P) have different cardinalities. 

From Theorem 0 the following corollaries follows. 

Corollary 3. Let S be a class of sets with a set-containment relation and a 
representation P. If is isomorphic to the Normal Completion of P, then 
for each si G So and for each s G S it is si O s = si or si n s = 0, where 
So = {s G S|Va; G S, if x C s then a; = 0}. 

Corollary 4. Let S be a class of sets with a set-containment relation and a 
representation P. If is isomorphic to the Normal Completion of P, then for 
every si,S2 G So, si f 1 S2 = 0, where So = {s G S|Vx G S, if x C s then a; = 0}. 

As discussed earlier, the existence of a universal partition is a sufficient, but 
not necessary condition for the isomorphism between the closure of a class S 
of sets with respect to the set-intersection operator and the MacNeille comple- 
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tion of a representation P of S. This means that the converse of Theorem 0] 
is not true, namely if there exists an isomorphism between and M{P) not 
necessarily a universal partition of S exists (see again the example in Figs. ^ 
and 0 ). However, thanks to Corollaries 0 and 0 we can effectively pursue the 
other alternative towards defining necessary conditions for the isomorphism, na- 
mely imposing additional constraints to class S. For this aim we introduce the 
following definition. 

Definition 8. Let S be a elass of sets with a set-eontainment relation, and let st 
be its greatest set. We say that S is consistent with respect to the set-containment 
relation Ua;GS ® where So = {s G S’jVa; G S, if x C s then x = 0}. 

The assumption of a class of sets to be consistent, is reasonable in many 
cases, since it means that if a set contains strictly another set, then the difference 
between the two sets is an ’entity’ which has to be represented in the class S. 
For example in a spatial database where a land is represented together with a 
city contained in it, it seems reasonable that the part of the land outside the 
city is also identified as an entity. 

We can show that for a consistent class S the isomorphism between and 
M{P), implies the existence of a universal partition of S. 

Theorem 5. Let S be a class of sets with a set-eontainment relation and a 
representation P. Lf is isomorphic to the Normal Completion of P and S is 
consistent, then there exists a universal partition on S. 

Putting together Theorem0and TheoremEl we obtain the following corollary 
that shows how strictly the existence of an isomorphism between and M{P) 
is connected with that of a universal partition on S. 

Corollary 5. Let S be a class of sets with a set-containment relation and a 
representation P. Let S be consistent. Then is isomorphic to the Normal 
Completion of P iff there exists a universal partition on S. 

This result means that in a spatial database that works with poset represen- 
tations of consistent classes of sets, the only way to perform spatial intersections 
among sets by means of the normal completion operator, is to provide the da- 
tabase with a universal partition. 



4 Conclusions and Future Works 

Partially ordered sets (posets) are widely used to represent classes of sets with a 
set containment relation. In this paper we have addressed the problem of how to 
perform natural set manipulations on a class by means of a poset representation 
of the class. Concerning set intersection we have stated sufhcent and necessary 
conditions for the correct use of the normal completion operator as a represent ant 
of set intersection operator. Moreover, for classes of sets satisfying a little more 
restrictive condition, we found a condition that is both necessary and sufficient. 
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Our results give further motivations to the use of posets to represent clas- 
ses of sets with a set containment relation, that was first advocated by Kainz, 
Egenhofer and Greasley in where proved the importance of normal comple- 
tion as a formal tool in modelling data for spatial databases. Future work will 
concentrate on characterizing also the set-union operator. 
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Abstract. We present a practical meldable priority qneue implementa- 
tion. All priority queue operations are very simple and their logarithmic 
time bound holds with high probability, which makes this data strnctnre 
more suitable for real-time applications than those with only amortized 
performance guarantees. Onr solution is also space-efficient, since it does 
not require storing any auxiliary information within the queue nodes. 



1 Introduction 

In this paper we present a randomized approach to the problem of efficient 
meldable priority queue implementation. The operations supported by this data 
structure are the following uni: 

MakeQueue returns an empty priority queue. 

FindMin(Q) returns the minimum item from priority queue Q. 
DeleteMin(( 5 ) deletes and returns the minimum item from priority queue Q. 
lNSERT(Q,e) inserts item e into priority queue Q. 

Meld(Qi, Q2) returns the priority queue formed by combining disjoint priority 
queues Qi and Q2- 

DecreaseKey(Q, e, e') replaces item e by e' in priority queue Q provided e' < e 
and the location of e in Q is known. 

Delete(Q, e) deletes item e from priority queue Q provided the location of e 
in Q is known. 

(The last two operations are sometimes considered optional.) 

In existing priority queue implementations the approach is two- fold. Most 
data structures require storing additional balance information associated with 
queue nodes in order to guarantee the worst-case efficiency of individual opera- 
tions (e.g. leftist trees |H|, relaxed heaps 0, Brodal queues PE!). Others achieve 
good amortized performance by adjusting the structure during some operations 
rather than struggling to maintain balance constantly (skew heaps 1 121 1 ;-jj . pai- 
ring heaps jni). Experiments indicate that the latter approach is more promising 
in practice 1 1 lYl9j . This is due to the fact that the worst-case efficient structures 
tend to be complex and hard to implement therefore big constant factors hidden 
in their complexity estimates prevail their theoretically superior performance. 



Randomized Meldable Priority Queues 345 



On the other hand, the main disadvantage of the amortized approach is that 
it cannot be applied in real-time programs, where the worst-case bound on the 
running time of each individual operation is crucial. 

Our solution, both simple and worst-case efficient (in the probabilistic sense), 
avoids these drawbacks by adopting the randomized approach, earlier applied to 
construct abstract data structures mainly in the context of dictionaries (e.g. 
Eim i. The idea is loosely based on leftist trees and skew heaps. All other ope- 
rations are defined in terms of Meld which in both structures is performed along 
right paths in melded trees. The subtrees of a node on the right path are exchan- 
ged in order to keep the path short: in leftist trees - sometimes (depending on 
their ranks); in skew heaps - always. In our data structure Meld operation is 
performed along random paths in melded trees. This approach has the following 
advantages: 

Simplicity. All operations are easy to implement and the constant factors in 
their complexity bounds are small, thus, given a fast random number gene- 
rator, the heaps should perform well in practice. 

Space economy. Since we do not need to preserve any balance conditions, no 
satellite information within nodes is necessary. 

Applicability to parallel computations. A single-pass top-down scheme of 
each operation allows to perform a sequence of operations in a pipelined 
fashion. Moreover, the loose structure of a heap allows to process disjoint 
sets of nodes independently. 

Worst-case efficiency. The execution time of each individual operation is at 
most logarithmic with high probability. The expected time behaviour de- 
pends on the random choices made by the algorithm rather than the dis- 
tribution of an input sequence, which allows using this data structure in 
real-time applications. 

The rest of this paper is organized as follows. In Section 0 we describe the 
data structure and the implementation of meldable priority queue operations. 
Section 0 is devoted to the efficiency analysis of these algorithms. Section 0 
presents some experimental results. Finally, Section0contains discussion of some 
extensions of the data structure and the conclusions. 

2 The Randomized Heap 

The underlying data structure of the randomized heap is a binary tree with one 
item per node, satisfying heap property, if x and y are nodes and x is the parent 
of y then item{x) < item{y). The heap is accessed by the root of the tree. 

Let us now describe the implementation of meldable priority queue operations 
for randomized heap. MakeQueue returns an empty tree and FindMin returns 
an item held in the root. In order to Meld two nonempty trees with roots Qi 
and Q 2 , respectively, we compare the items held in the roots. The root with the 
smaller key, say Qi, becomes the root of the resulting tree and Q 2 , the remaining 
one, is recursively melded with either left or right child of Qi, depending on 
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the outcome of a coin toss. More formal definition is given by the following 
pseudocode: 

heap function MELD(heap Q\^Q2) 
if Qi = NULL => return Q 2 
if Q2 = NULL ^ return Qi 
if item{Qi) > item{Q 2 ) Qi ^ Q 2 
if toss-Coin = heads left{Qi) := MELD(/e/t(Qi), Q2) 
else right{Qi) := MELD(ri(//it(( 5 i), Q2) 
return Qi 



(The results of Section 0 imply that the recursion depth is at most logarithmic 
with high probability. Moreover, this tail-recursion is easily removable and serves 
the purpose of increasing readability only.) 

The simplest way to describe all remaining priority queue operations is to 
define them in terms of Meld. In order to Insert item e into heap Q we create 
a single node containing item e and meld it with Q. DeleteMin melds the left 
and right subtrees of the root and returns the item held in the (old) root. 

For DecreaseKEY and Delete we need the parent pointer in each node. 
In order to decrease the value of node x in heap Q we detach the tree rooted 
at X from Q, adjust the item at x accordingly and then meld Q with the heap 
rooted at x. Operation Delete also detaches the tree rooted at x from heap 
Q, and then performs DeleteMin on heap rooted at x and finally Meld the 
resulting heap and Q. 



3 The Efficiency Analysis 

Since all non-constant-time operations are defined in terms of Meld, it is enough 
to analyze the complexity of melding two randomized heaps. 

Let us fix an arbitrary binary tree Q with n interior nodes containing keys 
and n -|- 1 exterior null nodes - the leaves of the tree. Define a random variable 
hq to be the length (the number of edges) of a random path from the root down 
to an exterior node (the child following each interior node on a path is chosen 
randomly and independently). In other words, the probability space is the set 
of all exterior nodes in Q with probability of a node at depth t equal to 2“*, 
and hq is the depth of an exterior node chosen randomly with respect to this 
distribution. 

Lemma 1. Melding two randomized heaps Q\ and Q 2 requires time 0{hq^ + 

^Qi)- 

Proof. The melding procedure traverses a random path in each tree until an 
exterior node in one of them is reached. □ 

It follows from Lemma 0 that in order to bound the complexity of melding 
randomized heaps it is enough to estimate hq for an arbitrary binary tree Q. 
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Theorem 1. Let Q be an arbitrary binary tree with n interior nodes. 

(a) The expected value Ehq < log(n+ 1). 

(b) Pr[hQ > (c+ l)logn] < for any constant c > 0. 

Proof, (a) The proof follows by induction on n. Assume n > 0 and let nz, and 
nn be the number of interior nodes in the left (Ql) and right (Qr) subtree of 
Q, respectively (thus n = ur + ur + 1). We have 

Ehq = ^((1 + ^^Ql) + (1 + E^Qr)) < 1 + + 1 ) + log(njz + 1)) 

= log 2-y/(nL + 1){ur + 1) < log 2— ^ Ll 

= log(riL + Ur + 2) = log(n + 1) 

(b) Note that for any fixed path 7 from the root to an exterior node the proba- 
bility that 7 is the outcome of a random walk down the tree equals where 

I7I is the length of 7. 

Let P be the set of all paths from the root to an exterior node in Q with 
length exceeding {c+ l)logn. We have 



Pr[/iQ > (c-kl)logn] = ^2-1'^' < 

7GT 7GT 



□ 



Corollary 1. The expected time of any meldable priority queue operation on a 
n-node randomized heap is O(logn). Moreover, for each constant e > 0 there 
exists a constant c > 0 such that the probability that the time of each operation 
is at most clogn exceeds 1 — 

Proof. Immediate by Lemma Q and Theorem ^ n 

4 Experiments 

We have carried out some tests to measure the behaviour of the randomized heap 
in practice. It is not hard to see that the value hq is bigger for more balanced 
trees and smaller for “thinner” ones. When we create a tree by inserting the keys 
1, . . . , n in the order of some permutation tt then the tree is more balanced if tt 
is closer to the sorted sequence < 1, . . . , n >, and “thinner” if tt is closer to the 
inverted sequence < n, . . . , 1 >. Thus our methodology was the following: for a 
fixed n subsequently we created a tree 

— from an almost sorted permutation transpositions away from< \. . .,n>), 

— from a random permutation, 
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— from an almost inversely sorted permutation transpositions away from 

< n , ...,1 >) 

then we computed the value of h and the total length of paths traversed while 
melding two such trees (both consisting of keys 1, . . . , n). Since we can get diffe- 
rent trees even from a fixed permutation, the outcomes were averaged over 100 
tests for each value of n. 

The results are summarized in the following table (each displayed value is 
the factor c in expression clog(n -b 1)): 





Almost sorted 
permutation 


Random 

permutation 
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permutation 


n 


h 


meld 


h 


meld 


h 


meld 


50 


0.85 


1.34 


0.79 


1.28 


0.63 


0.79 


500 


0.80 


1.39 


0.76 


1.31 


0.50 


0.65 


5000 


0.78 


1.41 


0.74 


1.32 


0.40 


0.49 


15000 


0.77 


1.42 


0.73 


1.32 


0.36 


0.41 



It turns out that in case of a tree obtained from a random permutation the 
value of h is just | of the value for the full tree. Moreover, the total length of 
paths traversed while melding two such trees is about 15% smaller than doubled 
value of h, as used for an estimation in Lemma^ (This is not surprising because 
only one of two random paths is traversed to the very end while melding.) 

5 Conclusions 

Before the concluding remarks let us note that the flexibility of the randomized 
heap can be increased by scaling it in the manner similar to well known d-ary 
heaps. Let us fix an integer d > 2 and make the underlying structure of the heap 
be a tree with at most d children in each node (kept in an array of size d) . The 
only change to operation Meld is that instead of tossing a symmetric coin we 
choose value t from {1, . . . , d} at random and recursively meld the tree with the 
bigger key at the root with t-th subtree of the other tree. An easy adaptation 
of the proofs from Section Q gives the following estimates for the complexity of 
operations on a randomized d-heap with at most n nodes: 

- MakeQueue, FindMin — 0(1) 

- Meld, DecreaseKey — O(log^n) 

— Insert — 0(d -I- log^ n) (we have to initialize d pointers in the new node to 
null) 

— DeleteMin, Delete — O(dlog^n) (we have to meld 0(d) heaps) 

We have presented a very simple randomized data structure capable to sup- 
port all meldable priority queue operations in logarithmic time with high pro- 
bability. The experiments show that the constant factors in the complexity of 
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the operations are in fact even smaller than those derived from the theoretical 
analysis. Simplicity, flexibility and small memory overhead make the randomized 
heap seem to be a practical choice for a meldable priority queue with worst-case 
performance guarantees. 

The following question looks as a good starting point for further research: 
does the randomized approach allow us to lower the asymptotic complexity of 
some meldable priority queue operations while keeping the data structure sim- 
ple? 
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Abstract. The problem of reconstructing a discrete set from its hori- 
zontal and vertical projections (RSP) is of primary importance in many 
different problems for example pattern recognition, image processing and 
data compression. 

We give a new algorithm which provides a reconstruction of convex 
polyominoes from horizontal and vertical projections. It costs utmost 
0(min(m, n)^ • mnlogmn) for a matrix that has m x n cells. In this 
paper we provide just a sketch of the algorithm. 



1 Introduction 

1.1 Definition of the Problem 

Let i? be a matrix which has m x n cells containing “0”s and “l”s. Let S' be a 
set of cells containing “l”s. Given S we put hi{S) which is the number of cells 
containing “1” in the fth row of S and we put Vj{S) which is the number of cells 
containing “1” in the jth column of S. We call hi{S) the ith row projection of S 
and Vj{S) the jth column projection of S. 

We consider the different properties of a set S. We say that a set S of cells 
satisfies the properties p, v and h if 

p: S is a polyomino i.e. S is a connected finite set. 

v: every column of S is a connected set i.e. a column in R containing “0” between 
two different “l”s does not exist. 

h: every row of S is a connected set i.e. a row in R containing “0” between two 
different “l”s does not exist. 

The set S belongs to class (x) {S G (x)) iff it satisfies the properties x. 

We can now define the problem of reconstructing a set S from its pro- 
jections: Given two assigned vectors H = {hi, h 2 , ■ ■ ■ , hm) G and 

V = {vi,V 2 , ■ ■ ■ , Vn) G {1, . . . , m}” we examine whether the pair {H, V) is satis- 
fiable in class (x). It is satisfiable if there is at least one set S G (x) such that 
hi{S) = hi, for i = 1, . . . , m, and Vj{S) = vj, for j = 1, . . . , n. We also say that 
S satisfies {H,V) in (x). 

We define a set S' as a convex polyomino if S G (p, v, h). 
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Fig. 1. A convex polyomino that satisfies (H, V) 



l. 2 Previous Work 

First Ryser and subsequently Chang P] and Wang ^ studied existence of 
S satisfying (H, V) in the class of sets without any conditions (0). They showed 
that the decision problem can be solved in 0{mn) time. These authors also 
developed some algorithms that reconstruct S starting from (H,V). 

Woeginger proved that the reconstruction problem in the classes of hori- 
zontally and vertically convex sets (h,v) and polyominoes (p) is an NP-complete 
problem. 

In P Barcucci, Del Lungo, Nivat, Pinzani showed that the reconstruction 
problem is NP-complete in the class of column-convex polyominoes (p,v) (row- 
convex polyominoes (p,h)) and in the class of sets having connected columns 
(v) (rows (h)). Therefore, the problem can be solved in polynomial time only if 
all three properties (p,h,v) are verified by the cell set. 

An algorithm that establishes the existence of a convex polyomino (p, v, h) 
satisfying a pair of assigned vectors {H, V) in polynomial time was described in 

m. The main idea of this algorithm is to construct a certain initial positions 

of some “0”s and “l”s and to perform a procedure called filling operation for 
each such position. We call them the feet’s positions. The number of possible 
feet’s positions in the algorithm is The filling operation procedure 

costs 0{wfin^). Hence, all the algorithm has a complexity O(m^n^). 
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In this paper we show a variant of above algorithm which has a complexity 
0(min(77i, n)^ • mnlogmn). In section 2 we describe some properties of convex 
polyominoes. In section 3 we show a new filling operation procedure which has 
only complexity 0{mn log mn) . And in section 4 we describe a idea of new initial 
positions which give a correctness solution. 

2 Some Convex Polyomino Properties 

We follow the notation from ina We assume n < m in the matrix R. li n > m 
we can exchange columns with rows. Moreover we assume 

m n 

3 = 1 i=l 

otherwise it does not exist a solution. 




1 ni U2 j 





IVd, 



Fig. 2. Some properties of convex polyomino: Nu^-i C Wj-i C Nd^ 



Let {ni^nf) be positions of “l”s in upper row, i.e. first row contains “l”s in 
cells from n\ to U 2 - And let (si,S 2 ) be positions of “l”s in lower (m-th) row. 
These cells we are called feet’s positions. Let us introduce the following notations: 

k k 

Hk = '^hj, Vfc = ^u*, 

7=1 2=1 
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m n 

i=i i=i 

We assume that 712 < si (the case S 2 < rii is similar and other cases we do not 
consider). Let Wj be the set of “l”s in first j columns, and let Ni be a set of 
“l”s in first i rows (see Fig.Ej). Let R{uj,j) and R{dj,j) be the upmost and the 
lowest cells of j-th column containing “1” . 

Proposition 1. |2| For all j G [ri 2 + l..si — 1] we have 

C Wj-i and Wj-i C Nd^. 

From above proposition and its variants we get: 

Corollary 1. If U 2 < si then for all j G [ri 2 + l..si — 1] we have 

Fluj — i ^ ^—1 and ^ 

A - Hdj < A-Vj and A - Hu^-i > A - V,-i. 

If S 2 < ni then for all j G [s 2 + l..ni — 1] we have 

Huj-i < A-Vj and Hd^ > A - Vj-i, 

A - Hdj < Vj-i and A - Hu^-i > Vj. 

We use above properties in section 4 for finding positions of some initial “l”s. 

3 Filling Operation 

We use the balanced binary trees (like e.g. AVL) in our procedure with the 
following operations: 

empty( tree ) — a function returning true when a tree is empty or false other- 
wise. It always costs 0(1). 

delete( k, tree ) — a procedure deleting an element k from a tree. The comple- 
xity of the function is less than 0(log |tree|), where |tree| means size of a 
tree (a number of elements in a tree). 

insert( k, tree ) — a procedure putting fc in a tree where k ^ tree or doing 
nothing otherwise. The complexity of this function is less than 0(log |tree|). 
min( tree ) — a function returning a minimal element of a tree. It costs less 
than 0(log |tree|). 

max( tree ) — a function returning a maximal element of a tree. It costs less 
than 0(log |tree|). 

We have two global variables treecoi and tree^ow which are balanced binary 
trees. In these trees we will store the numbers of columns and rows, respectively, 
which we will review in a next step of the main loop in our procedure. 

For each row i, where i G [1, . . . , m], we define the following auxiliary varia- 
bles: r*, p*, g*, r, r*, pb gb freeO* (for each column j, where j G [1, . . . , n], we 
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define Ij , r j , pj , qj , Ij , rj , pj , qj , freeO j , respectively) . The variable I is a minimal 
position containing “1”, r is a maximal position containing “1”, p is a minimal 
position without “0” and g is a maximal position without “0” , respectively, for 
all rows and columns. The variables I, f, p, q are temporary values of I, r, p, 
q, respectively. The variable freeO is the balanced binary tree containing “0” 
positions which are between p and q. 

We initialize these variables in a row as follows l = l = n+l, r = f = 0, 
p = p=l, q = q = n, freeO = nil (in column I = I = m+1, r = f = Q, p = p = 1, 
q = q = m, freeO = nil, respectively), where nil means the empty tree. 

We introduce two auxiliary operations: 



put “0” in the ith row in the jth position: 



if R[i,j] = 1 then exit( fail ) {we break the procedure in this case} 
if R[i,j] yf 0 then (it is a new “0”} 

R[i,j] ^ 0 

insert ( j, treecoi ) 

if t < Pj + Vj and i > pj then 



Pj ^ i + I 

while not empty ( freeOj ) and (A: <— min( freeOj ))<Pj + Vj do 
delete ( k, freeOj ) 

Pj ^ k + l 

if i > pj — Vj and i < pj then 



pj ^i-l 

while not empty ( freeO j ) and (A: <— max( freeO j ))>pj — Vj do 
delete ( k, freeOj ) 
pj ^ k — 1 

if Pj + Vj < i < pj — Vj then insert ( i , freeO^ ) 



put “1” in the ith row in the jth position: 

if R[i,j] = 0 then exit( fail ) {we break the procedure in this case} 
if R[i,j] yf 1 then {it is a new “1”} 

R[i,j] ^ 1 

insert ( j, treecoi ) 

if Vj < Ij then {column j hasn’t “l”s} 
h ^ rj ^ Ij ^fj^i 
if Pj < i — Vj + 1 then pj ^ i — Vj + 1 

if pj < i + Vj — 1 then pj ^ i + Vj — 1 

while not empty ( freeOj ) do 
k <— min ( freeOj ) 
delete ( k, freeOj ) 

if A; < A and k + 1 > pj then pj <— A: + 1 

if A; > A and k — 1 < pj then <— A: — 1 

else {column j has “l”s} 
if z < Ij then Ij <— i 
if z > fj then fj <— z 
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The operations described above retain in memory the number of a column that 
is modifying when we put new symbol in a row. We analogously define these 
operations in columns. 

Now we define ©, 0,0,0 operations described in p. They are of the follo- 
wing form: 

operation 0 in the ith row: 
if r < r then 

for j ^ Z® to /® — 1 do put “1” in the ith row in the jth position 
if P > P then 

for j ^ r® 0 1 to f® do put “1” in the ith row in the jth position 

operation © in the ith row: 
if p® < p® then 

for j ^ p® to p® — 1 do put “0” in the ith row i in the jth position 

p® ^p® 

if g® > g® then 

for j ^ g® 0 1 to g® do put “0” in the ith row in the jth position 
g® ^ g® 

operation 0 in the ith row: 

if Z® > r® and p® 0 Zi^ — 1 > g® — Zi^ 0 1 then 
Z® ^ Z'® ^ g® - Zij 0 1 
^ p ^ pi hi -I 

for j ^ Z® to r® do put “1” in the ith row in the jth position 
if Z® < P and g® - Zi^ 0 1 < Z® then 
for j <— g® — Zii 0 1 to Z® — 1 do 

put “1” in the ith row in the jth position 
Z® ^ Z'® ^ g® - Zii 0 1 
if Z® < r® and p® 0 Zi^ - 1 > r® then 
for j ^ r® 0 1 to p® 0 Zii — 1 do 

put “1” in the ith row in the jth position 
r® <— f® <— p® 0 Zii — 1 

operation © in the ith row: 
if Z® < r® and p® < r® - Zi^ then 

for j ^ p® to r® — hi do put “0” in the ith row in the jth position 
p* -h, + l 

if Z® < r® and g® > Z® 0 h, then 

for j ^ Z® 0 hi to g® do put “0” in the ith row in the jth position 
g® ^ g® ^ Z® 0 Zii - 1 

The operations 0, 0 put new “l”s in matrix R and the operations ©, © put 
new “0”s. We analogously define these operations in columns. 
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The main loop of the procedure filling operation has the following form now: 

The main loop of the procedure: 
repeat 

while not empty ( treerow ) do 
k ^min( treerow ) 
delete ( k , treerow ) 

perform operations 0,0,®,© in the fcth row 
while not empty ( treecoi ) do 
k ^min( treecoi ) 
delete ( k, treecoi ) 

perform operations 0,©,®,© in the fcth column 
until empty ( treerow ) and empty ( treecoi ) 

When we do preprocessing (described in section 4) we put neither “0” nor 
“1” . We only modify variables p and g of a particular row or a column when it 
is necessary. We put the numbers of these rows or columns in treCrow or treCcoi, 
respectively. We will put “0” or “1” while performing filling operation procedure 
described above (see the © operation and the ® operation). 

If the filling operation procedure returns fail, we know that a convex poly- 
omino which has projections H and V (and the same initial position) does not 
exist. 
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Fig. 3. 3 unjoined cycles: (ai, . . . , ae), (6i, . . . , 6e), (ci, . . . ,cig) 



If trees treerow and treecoi are empty, we have two different cases: 
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case 1: Each cell of R contain “0” or “1”. We have the solution. The set S' is a 
convex polyomino and satisfies {H,V). 

case 2: Each row contains at least one “1” (we assure this in section 4) and we 
have some cells in R which contain neither “0” nor “1” (see Fig. E|)- If we 
have any row or any column containing these empty cells and at least one 
“1”, then the auxiliary variables in the row or the column will satisfy the 
properties: 

I— P=q — Ty^O. 

If any column have not “1”, the number of empty cells in this column is 
equal to double number of “1” that we can put in this column. Moreover, if 
R[i,j] contains neither “0” nor “1”, then it exists R[i',j'] containing neither 
“0” nor “1” and satisfying i = i' and \j — j'\ = hi or j = / and \i — i'\ = Vj. 
In addition the number of empty cells in entire R is equal to double number 
of missing “l”s. Hence, the cells, which contain neither “0” nor “1”, form 
a cycle or a union of disjoint cycles, each of them contains at least 4 cells. 
The cells of the cycle are labelled alternately “0” and “1” . But some cycles 
are labelled dependent. In order to fill these cells correctly we build suitable 
2-SAT problem, that can be solved in linear time (for more details see P). 
Because the number of empty cells is less than mn the additional cost of 
solution in this case is at most 0(mn). 

Now we estimate the complexity of the main loop in the filling operation 
procedure. In each position (i,j) we perform operation put only twice (one 
operation in the ith row and one operation in the jth column). Moreover, when 
we do operations 0, 0, 0, 0 in a row or in a column in our algorithm, we execute 
at least one put operation. Hence, we review only 0{mn) columns and rows 
and the review of one row costs O(logn) + [cost of the put operations] and the 
review of one column costs 0(log77i) + [cost of the put operations]. Therefore, 
the global cost of the main loop of the algorithm is 0(mn(logm + logn)) + 
[cost of all put operations] . 

Now we estimate global cost of all put operations. In the ith row when we 
perform put operations we execute at most m insert operations in treOcoi- It 
costs 0(m log m). For all rows the cost is at most 0(mn log m). In all columns 
the cost of the insert operations in treerow is at most O(mnlogn), analogously. 

Since the insert operations in freeO* in the ith row we are doing no more 
than one time for each position. There are not more than m delete operations, 
either. We execute functions min and max only during modifying p* or Sf . Hence, 
the number of these operations is at most m. All operations in tree freeO* cost 
at most 0(m log to). For all n rows the cost is at most Ofmnlogm). In all to 
columns the cost of the operation in trees is at most 0{mnlogn), analogously. 

The complexity of all residual operations is at most 0{mn). Hence, the cost 
of the procedure called filling operation is at most 0{mn{\ogm 0 logn)). 

The proof of the correctness of the procedure is a small modification of the 
proof from PQ . 

Theorem 1. The filling operation procedure costs at most O(mnlogmn). 
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4 Main Algorithm 

The main idea of the algorithm is testing all possible positions of “l”s into first 
and last rows, i.e. feet’s positions. If we fix any initial positions of upper and 
lower rows, we will use the Corollary Q for computing positions of some “l”s 
in columns between feet’s positions. We want to have at least one “1” in each 
row when we start filling procedure described in section 3. It assures the correct 
effect of working this procedure. 

If we have feet’s positions (ni,ri 2 ) and (si,S 2 ) and ri 2 < si, we compute for 
all j e [ri 2 + 1 ..S 1 - 1]: 

Dj = minjz S [l..m — 1] : A — Hi < A — Vj}, 

Uj = maxjj S [2 . .to] : Hi_i < V,-i}. 

If ni > S 2 we compute for all j € [s 2 + l..ni — 1]: 

Dj = min{i G [1 ..to — 1] : A — Hi < V}_i}, 

Uj = max{i G [2 . .to] : Hi_i < A — Vj}. 

It is easy to check that always Uj < Dj and moreover, in first case Dj + 1 > Uj+\ 
and in second case Uj + 1> Dj^i. Hence, in j-th column we can put “1” in all 
cells between Uj and Dj and we can put “0” in cells upper Dj —Vj + 1 and lower 
Uj + Vj — 1. Moreover, we have all “l”s and “0”s in columns which are appointed 
by feet’s positions. Finally, we have at least one “1” in each row. 

Otherwise, if both feet’s positions have a common column then its must 
contain only “l”s because we have “1” on the first and on the last position in 
this column and a area of “l”s is connected. Hence, in this case we also have at 
last one “1” in each row. 

The preprocessing described above costs at most 0 (to + n). 

We assume, there exists convex polyomino S satisfying (H,V). If we guess 
the right feet’s positions of S (because we tested all feet’s positions we must 
guess it correctly in course the time) we will have all “l”s and “0”s in columns 
ni...n 2 and si...S 2 - Moreover, we have at least one “1” in each column between 
feet’s positions (if there exist such columns). Finally we have at last one “1” in 
each row and each of them is correct. Hence, the filling procedure cannot answer 
fail and must return the correct polyomino. 

If for vectors (H,V) do not exist convex polyomino S satisfying (H,V) the 
filling procedure answers fail. 

The number of all feet’s positions tests is at most and it is equal to 
min(TO, n)^. The preprocessing and filling procedure costs at most Ofmnlog mn). 
Hence, we have 

Theorem 2. The reconstruction of convex polyomino with vertical and horizon- 
tal projections costs at most 0(min(m, n)^ • mnlogmn). 
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Abstract. Considering the extension to the object-oriented model in- 
troducing multiple type objects, we faced the problem of disambiguating 
method dispatching. This issue is closely connected with the conflicts 
in classes defined by the multiple inheritance. We studied the current 
works from a broader perspective of conceptual modeling considering 
both theoretical and practical views. As a result, solution based on the 
method redefinition constraints was proposed. For the presentation of the 
main ideas, we use the formal tools of category theory. It is in accordance 
with our former attempts to describe object-oriented models in terms of 
categorical constructions. 

Key words: object-oriented database model, modeling roles, method 
dispatching, category theory. 



1 Introduction 

Recent research in the information system design and conceptual modeling shows 
some practical limitations of the object-oriented paradigm. 0 describes difficul- 
ties connected with modeling roles. It was pointed out that the assumption usual 
for the object-oriented models that objects acquire only one type for all their 
life time is often broken in the real-world situations. As a result, a role model 
was proposed defining two operations that enable to acquire and discard types. 
There were proposed also other approaches to this problem in P and p. 

Further studies identified potential complications concerning attribute access 
and method dispatching in an object-oriented model extended by roles. This issue 
is closely connected to the conflicts accompanying the multiple inheritance. An 
object can acquire multiple types at runtime which are equivalent to that ones 
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defined by the multiple inheritance. Thus, we give rise to potential conflicts in 
state and behavior of objects. Such a problem was recognized in |2| proving that 
the structural conflict in multiple type objects can be solved by static types 
(or contexts in which an attribute is referred). Nevertheless, it was shown that 
the context information is not enough to disambiguate the method dispatching. 
Two approaches were proposed to solve this problem with discussion of their 
advantages and disadvantages. 

In this paper, we present solution based on the method redefinition con- 
straints. Our objective is to show that the problem of method dispatching should 
not be considered separately but in connection with the semantics of the role 
model itself. Our approach uses the formal tools of category theory in accor- 
dance with our former attempts to describe object-oriented models in terms of 
categorical constructions (see 0 and |E|). Categorical modeling manifesto ^ 
assumes that the category theory is especially suited to study the properties of 
object-oriented models. 



2 A Model with Multiple Class Objects 

The object-oriented paradigm was developed, besides other things, with the 
aim to support direct representation of real-world entities. However, the usual 
assumption that an object has a structure determined once for all of its lifetime 
breaks this principle. Consider an example of persons, students and readers and 
suppose that the type SR constructed by using the multiple inheritance has not 
been defined yet. In such a database, we can model the roles of the persons by 
using the types S and R. However, we can not represent a person that studies at 
the university and is stored in the database of the university library as a reader at 
the same time. Such a situation is common because information systems usually 
consist of several modules that share the data and each module views them from 
its specific perspective. 

In this case, the object-oriented paradigm provides multiple inheritance to 
model this semantics. We can combine the properties of student and reader de- 
fining a new type student-reader . This technique is adequate for the languages 
that work with transient objects only. The lifetime of such objects is limited 
by the lifetime of the application process that created them. However, in the 
database applications objects are persistent and exist independently on the pro- 
cess that created them. Practical experiences often signify that it is difficult to 
presume beforehand all the possible combinations of types that an entity can 
have. With the growing complexity of database applications the problem is even 
worse. It could lead to combinatorial explosion of the types defined by the mul- 
tiple inheritance. 

To avoid this problem we can introduce so called multiple type objects to- 
gether with two operations for acquiring and discarding types. This concept will 
prevent us to build all required types defined by the multiple inheritance and 
enable objects to play different roles as needed in the real-world situations. Such 
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an extension to the object-oriented model is considered in 0 presenting formal 
semantics of the operations acquire and discard. 

As was already stated, in we used category theory to deal with the problem 
of attribute access in the contex of multiple class objects. There the approach 
using this theory was very useful. In this paper, category theory will help just 
to discribe the structure of classes defined by multiple inheritance using simple 
product notation. In the categorical framework, we can describe the semantics 
of the inheritance using the notion of product: 

Definition 1 (Product) Having two objects S and T in a category we define 
product as an object S xT together with projections tTs '■ S xT ^ S and 
TTt : SxT — > T such that for any object V and arrows Qs : V ^ S and qt :V —> T, 
there is a unique arrow q :V —^SxT making commute the diagram 

F 

qt q \ qt 

S:^SxT:^T 

□ 



However, to describe the semantics of multiple inheritance some additional 
commutativity conditions will be needed as explained in j^. Particularly, the 
notion of a pullback is used to express the semantics of virtual base types. To 
illustrate, we will show how to define type hierarchy of types student S and 
reader R, having the common ancestor person P and a type SR inheriting from 
S and R using multiple inheritance. This simple university database with library 
module is shown in Fig. ^ 



S+ 


R+ 


7r|+'*\^S:P 


'*\^R:PyX 


^R+ 


sp 






'R'SR:S\^ 


y^'PSR-.R 





SH - SR+ 

^SR+ 



Fig. 1. Inheritance as a product 



Semantics of the virtual base types can be expressed as a construction of an 
object SR^ such that the diagram in Fig. Q] commutes and for each possible SR' 
satisfying this condition there is always unique arrow to SR^ . This construction, 
described here, is called pullback which is a specific case of more general limit 
construction. It can be equivalently stated that SR satisfies the conditions requi- 
red for S' X i? by Definition ^ but the commutativity condition here is extended 
to the whole diamond diagram of the multiple inheritance. 



Behavioral Safety in a Model with Multiple Class Objects 



363 



Thus, the object SR^ = S^x ppRPxSR~^ is sometimes called product restricted 
over PP. We use this notation omitting PP in the superscript of x to simplify 
the description of types. Instead of spelling out the detailed limit construction 
we write only the types participating in the product. Nevertheless, one can not 
forget the extended commutativity and uniqueness conditions that must be also 
satisfied if applicable. 

Following this convention, the structure of SR^ can be expressed in various 
ways as: SR^ = S"PxRPx SR~^ (structure of SR in the terms of its most specific 
classes) = PPxSPxRPxSR~^ (structure of SR in the terms of all of its supertypes) 
= PP X S^ X R^ X SR~^ (structure of SR in the terms of its components). Notice, 
that in the letter case no additional conditions are required to be satisfied. In 
Set, this translates into the following structure: 

SRP = {(p, s, r, sr) I p G s G 5'“*', r G R'^ , sr G SR^}. 

In a model enabling multiple inheritance together with the notion of method 
redefinition, there can be more than one method with the most specific behavior. 
The fact that there are no unambiguous implementations of virtual methods 
will be called behavioral safety. Condition under which a virtual method is 
unambiguous can be expressed as follows: 

Definition 2 (Method Unambignity) The method m is unambiguous if and 
only if for all types T — Ti x T2 x ... x T^, where Ti,T2, . . . ,T„ are all the 
supertypes of T, and for each two R ^ Tj that are not in the subtype relation 
and both implement m there is always Tk implementing m together with the 
arrows 'nTk-.Ti '-Tk^Ti and TCTk-.Tj - Tk Tj. □ 

3 Preferred Class and Argument Specificity Approaches 

There are two basic solutions to the problem of method dispatching proposed 
in |2|. One special case is considered separately in the conclusion of that paper. 
Generally, we have two possibilities how to dispatch the conflicting method m for 
a S'P-object. The first concerns the special situation when m does some initia- 
lization. In this case it would be reasonable to call both. Other situations do not 
enable to invoke both methods and require to choose only one implementation 
that was most probably meant by the application programmer. 

One possibility is to define an order on the types that reimplement the 
method m separately for each context S and R where m can be invoked. For this 
reason, this solution is called preferred class approach. Notice that this approach 
supports context dependent behavior since the order is defined for each context 
in which m can be invoked. 

Another way of choosing the right implementation is to compare parameters 
of the method definitions and choose the one which seems to match the actual 
parameters better. Even if this argument specificity approach is not always able 
to dispatch a method it ensures a notion of behavior identity. There is also 
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another heuristics that according to the statistics dispatches the method which 
was invoked more frequently in the past. However, the semantics of the message 
passing mechanism in such a model would be very indefinite. 

4 Constrained Method Redefinition Approach 

The solution presented in this paper is inspired by the study of the concept of 
multiple type objects as related to the real-world situations. We noticed that 
behavioral conflicts often do not point to a limitation of the model which is 
seemingly lacking the means to direct the method invocation in the conflicting 
cases. Rather, they reflect some inconsistency in the database schema itself. 

To illustrate, consider a method get-Contact for person redefined in the 
subtypes student and lecturer. Suppose that this method returns the name of a 
person (in the case of lecturer it adds his degree) and the address for an official 
correspondence. Let us say that the official contact address for a person is his 
home address, for student his university address and for lecturer the address of 
his department. Now we can create an object of the combined type representing, 
for example, Ph.D. student that works as a teaching assistant. 

Invoking the method get-Contact for the objects of type x in order to 
write an official letter to a specific person would cause behavioral conflict. Here, 
some context dependent solution would be preferable. Notice also that it would 
be possible to return the answer as a set of all the contact addresses and let 
the user choose what to do with this set. But in a general case the method can 
change the state of the object thus preventing to get the set of all its possible 
results without the damaging consequences of its side effects. 

This conflicting situation can be solved sensibly also earlier in the process of 
the database schema design. For the analyst, it is clear from the beginning that 
student and lecturer are two roles that the objects in the real world can acquire 
and discard independently. The object-oriented database model should enable 
to express this semantics in the database schema and help to see the possible 
sources of conflicts thus preventing possible errors in the design process. 

Therefore, rather than deciding which method should be dispatched , we 
disable such a situation at all. The types equivalent to the independent roles 
will be allowed to redefine its methods only under some strict rules. We call this 
principle constrained method redefinition approach. 

5 Refining the Model with Method Redefinition 
Constraints 

Developing the idea to constrain the method redefinition in a way that prohibits 
behavior conflicts in the multiple type objects, we can find many interesting con- 
sequences. First, we will describe the way we recommend to restrict the method 
redefinition and show how to express the notion in the inheritance hierarchy 
graph. Second, we will divide subtypes into two disjoint sets. The first will cor- 
respond to a set of exclusive roles and the other to a set of independent roles. 
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Then we will illustrate the close connection between the notions of method rede- 
finition constraints and independent roles. Finally we will formulate a condition 
under which objects can acquire new types with respect to the method redefini- 
tion constraints. We will close the presentation by a proof that in a model with 
the method redefinition constraints there are no behavioral conflicts for multiple 
type objects whose types are changed using the restricted operation acquire. 

5.1 Constraints as a Property of Direct Method Inheritance 

We explain how to constrain the method inheritance first because it is more 
simple. The motivation that led us to do it this way will be seen later in Section 
Ih.'A The constraints will not be connected with specific methods or types but it 
will be a property of the inheritance itself. This way the complexity of the model 
is hold in reasonable limits. Each edge in the inheritance hierarchy will have 
additional marker that will denote the constraint. In the inheritance graph, it 
will be depicted as a blocked (dashed) edge as shown in Fig. [3 The methods 
inherited through this edge are blocked and can not be redefined any more at any 
further subtype down the inheritance hierarchy. It should be emphasized that 
the block does not concern only the direct subtype but all the subtypes that 
can be reached through the dashed edge. However, the method can be actually 
redefined for such a subtype because the graph of the multiple inheritance can 
provide two paths for method inheritance. This case is allowed but it has direct 
consequences leading to a greater restriction of the types acquirable by objects. 
It will be further discussed in Section 15.31 



person 

I \ 



TTS:P ,'!TR:P 



TTE:P 



student reader employee 




internal external full-timepart-time lecturer secretary 



Fig. 2. Graph representation of method redefinition constraints 



5.2 Constraints and Independent Roles 

It is interesting to investigate the connection between the method redefinition 
constraints and independent roles played by the objects in the real-world situa- 
tions. It should be noted that the arrows going to a type define partitioning into 
two disjoint subsets of the direct subtypes. This can be seen in Fig. 0 where 
the subtypes of employee can be partitioned into two sets {full-time , part-time} 
and {lecturer , secretary}. Notice, that all the subtypes of the second set have 
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disabled method redefinition. We will call this set as a set of independent roles. 
This means that such types can be freely combined with each other to form mul- 
tiple type objects. Thus, the composed objects will not suffer by the behavioral 
conflicts since there is only one implementation of the most specific method and 
this is the implementation used for employees. 

On the other hand, we have the set of exclusive roles. Because we enable to 
redefine the methods for subtypes in this set we must prevent the existence of 
multiple type objects containing more than one type belonging to some exclusive 
roles set excluding all its subtypes as well. This way we ensure that there are 
no behavioral conflicts in the model. We will define formally which types can be 
precisely combined in the next section. In the rest of the paper, we will denote 
the independent roles set as and the exclusive roles set as T®^. 

5.3 Acquirable Types 

If the database schema satisfies the method redefinition constraints the behavi- 
oral conflicts will be prevented provided that an object can acquire only limited 
set of types. 

From the notion of the independent roles set one can infer that behavioral 
safe combinations of the direct subtypes of T are subsets of T™®* U {E} where 
E G j arbitrary types from the independent roles set and at most one 
type from the exclusive roles set. We will denote the direct safe combination set 
as . If there is no unambiguity concerning the methods in with respect 
to multiple inheritance there will be no unambiguity in the type combinations 
based on this set neither. 

However, we are not limited to one level of the inheritance hierarchy graph. 
Types can be combined from different levels. Therefore, we should extend the 
notion of the direct safe combination set to a more general safe combination set 
A^afe defined for type A as follows: 

Definition 3 (Safe Combination Set) Having a type A G Dj then B G 
if and only if there do not exist types G supertype of A and type E supertype 
of B, G H, having distinct direct supertype F such that G,H G (G and 
E[ are not elements of the exclusive roles set of F). □ 

Notice the symmetry B G A®“^® A G 5^*“^® caused by the existence of the 
same F, G and E[ in both cases. It seems that in order to combine types A and 
B it would be enough to require the existence of G and E[ to be elements of 
as opposed to the non-existence condition that G and El are both elements the of 
exclusive roles set. However, the assumption that any method will be redefined 
only on one path leading through G or H is not true in such a case. The next 
example in Fig. 0is used to illustrate that the non-existence requirement is not 
superfluous. 

For the types pt-student and ft-lecturer there exist two distinct corresponding 
supertypes student and lecturer that both seem to prohibit method redefinition 
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Fig. 3. One special case when acquire is restricted 



by the dashed edges. However, there is potential conflict in any object derived 
from pt-student and acquiring ft-lecturer. This object would have virtual type 
assistant that can have redeflned any method inherited from employee through 
the types part-time and full-time belonging to the exclusiue roles set of employee. 

This is easy to check that ft-lecturer does not belong to the safe combination 
set of pt-student according to our definition since there exist two types part-time 
and full-time that break the non-existence condition. We will see the importance 
of this point later in the next section dealing with the notion of behavioral safety. 



5.4 Behavioral Safety in the Constrained Model 

We conclude the discussion proving that the model with the constrained method 
redefinition and restricted operation acquire prevents behavioral conflicts. We 
can show that there is no behavioral conflict for types in the form AxB, where 
B € directly using the Definitions | 2 | and |3 from the latter we see that 

any method m can not be implemented in A and B at the same time because 
there do not exist two paths beginning from F that would enable its redefinition. 
Thus, if there was no unambiguity for min A and B itself there can not be found 
any and Tj according to Definition 0 for the combined type AxB. 

Nevertheless, we must consider that the multiple type AxB can be further 
extended. Notice, that such a type can not be freely combined with other ty- 
pes from Rather, the safe combination set defines a symmetric but non 

transitive relation over the types that can be combined. We admit reflexivity 
since it is meaningful with respect to the notion of method unambiguity. It is 
clear that the relation is not transitive looking at the hierarchy in Fig. 0 While 
full-time, part-time € studenf°“^^ yet they can not be acquired at the same time. 
The reason is that the non-existence condition in Definition 0 may not be satis- 
fied because there are no requirements on B,C £ distinct from A. 

Thus allowing to combine types B,C G arbitrarily with the type A does 
not necessarily mean that we can have an object of the multiple type BxC. All 
the types that can be combined with the type A are defined using the maximal 
safe combination set 
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Definition 4 (Maximal Safe Combination Set) Let A € Dj be a type in 
the inheritance hierarchy and the safe combination set for the type A. 

Then maximal safe combination set with respect to the type A is a set denoted 
as satisfying: 

(i) yf G A^safe 

(ii) B,C £ ^ B G (and also C G 

(Hi) yX satisfying (i) and (ii) : X □ 

The first condition states that A is always a part of the Second, each 

pair of must allow type safe combination. Finally, notice that is 

maximal with respect to the types in the inheritance hierarchy, i.e. no other type 
can be added to without breaking the condition 2. There can be several 

maximal combination sets for each type A. 

Now we can formulate the following proposition for the refined model: 

Proposition 1 (Behavioral Safety) Let Dj be a type hierarchy with method 
redefinition constraints without behavioral conflicts. Then for each maximal safe 
combination set with respect to the type A G D/: = {Ai, A 2 , . . . , A„} 

there is no behavioral conflict in the multiple type objects o based on the types 
from the set o G MTA = A 1 XA 2 X . . .x An- □ 

In other words, defines the types acquirable by ^-objects at the same 

time yet ensuring that there is no unambiguity concerning method dispatching. 
PropositionQguarantees behavioral safety for all the multiple type objects. Sum- 
marize that the restricted product, denoted as MTA, describes the structure of 
multiple type objects based on the type A extended by all the types from 
without the risk of behavioral conflicts. 

Proof. We rewrite the multiple type MTA as MTA = Ti x T 2 x . . . x Tn, 
where { Ti, T 2 , . . . , T„} are all the supertypes G T)j of MTA . We also know 
that Tj does not redefine the methods of Ti if TTTj-.Ti '■ Tj Ti is constrained. 
Let us denote by Ti and Tj any two distinct elements of the set { Ti, T 2 , . . . , Tn} 
that are not in the subtype relation and suppose that they both implement the 
method m. Now we have two possibilities. 

1. There exists, according to Definition ^ Tk together with the arrows 
TTTk-.Ti ■ Tk ^ Ti and 'XTk-.Tj '■ Tk — > Tj. This means that Tk must im- 
plement TO. Otherwise there would be disambiguity already in the model 
because Tk gT)j. 

2. Tk, subtype of Ti and Tj, does not exist. In this case, the rest of the 
proof consist in showing that the method to could be implemented only 
in one of the types Ti and Tj. Let us suppose that to is reimplemented 
in both and defined for the first time in F. This means that there must 
exist at least two different types G and H where G is a supertype of Ti 
and H supertype of T,- such that H and G are inherited through the non 
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constrained edges (otherwise the redefinition of m would be restricted and 
could not be redefined in both Ti and Tj). However, this contradicts the 
assumption that Ti and Tj belong to the maximal safe comhination set 
according to Definitions 0and0 Therefore, m can not be implemented in 
both Ti and Tj and is unambiguous in accordance with Definition |3 □ 



6 Conclusions 

Considering an extension of the object-oriented model by multiple type objects 
we studied possible solutions to the problem of disambiguating method dispat- 
ching. The issue is carefully investigated in |2j providing the proof that structural 
conflicts cause no problems when the context of attribute access is considered. 
This led us to propose the semantics of operations acquire and discard as pre- 
sented in 0 and study the multiple type objects using the tools of the category 
theory. 

However, the context information is not enough to disambiguate the method 
dispatching. The two approaches that were proposed to solve this problem in j2j 
seem to make the database schema design more complicated for practical use. 

Thus, while working on an implementation of the object-oriented database 
model closely following the ODMG-93 standard |21, we proposed another so- 
lution enriching the semantics of the method inheritance. Method redefinition 
constraints were introduced, disabling to redefine methods inherited through 
blocked edges in the rest of the inheritance graph. We could apply this schema 
for each method separately but it seems that the model would become unne- 
cessarily complex. Therefore, we connect constraints with the inheritance itself. 
Nevertheless, additional research would be needed to clarify this issue. 

We have carefully studied the consequences of such extension providing for- 
mal semantics of the concepts. One advantage of the chosen approach is that only 
meaningful combinations of types are allowed for the multiple type objects. This 
clarifies the schema design and gives us the possibility to move majority of the 
runtime checking into the compiletime. Otherwise, it would not be easy to infer 
whether an acquired type is a source of behavioral conflict. Using an algorithm 
based on Proposition ^ we can generate a table for an effective runtime checking 
of the operation acquire. On the other hand, our approach has the disadvantage 
of the argument specificity approach that disables static typing. Even if methods 
are always dispatchable, operation acquire can cause runtime errors. We have 
also illustrated how the theoretical results translate into the real-world notions 
understandable for the database practitioners. 

Evaluating the constrained method redefinition approach, the problems seem 
to be clear on the theoretical level. However, there are several possibilities how 
to restrict the redefinition of methods and, consequently, the corresponding safe 
combination relation. The effectiveness and usefulness of the constrained method 
redefinition approach must be yet tested in the real environment. 
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Abstract. This paper considers the construction of the suffix array of a 
string on the MasPar MP-2 architecture. Suffix arrays are space-efficient 
variants of the suffix trees, a fundamental dictionary data structure that 
is the backbone of many string algorithms for pattern matching and 
textual information retreival. We adapt known PRAM techniques for 
implementation on the MasPar: bulletin boards, doubling techniques and 
sorting methods. Performance results are presented. 



1 Introduction 

Given a text T, the retrieval of statistical information about T can be accom- 
plished using textual search techniques. One such technique is to firstly compute 
an index for T that can subsequently be used to answer a variety of queries. This 
index must be efficient in terms of construction time, memory use and storage 
space. In the field of string algorithms, the suffix tree data structure (uni, El) 
is an elegant data structure that provides an index and a statistical resource for 
a given input text: for a text T of length n, a suffix tree of T is a compacted 
trie of all the (unique) suffixes of T. It is well known that a suffix tree can be 
constructed in linear sequential time and space. However, the constant hidden 
by the 0(n) space requirement is sufficient to render this data structure im- 
practical in many real applications. Consequently, recent algorithms have been 
devised that consider the practical implications of suffix trees where space is 
traded for query time (0.I2!). 

An alternative data structure that also provides an index for a given text is 
the suffix array El. The suffix array of a text T of length n is an array of size 
n such that the i-th entry is the i-th smallest suffix according to the lexicogra- 
phical ordering of strings. The suffix array therefore represents the leaves of the 
ordered suffix tree for the text T. Two additional arrays (or a further 2n — 4 
values) associated with longest common prefix information are required to guide 
searching in the suffix array. This information represents the inner (branching) 
nodes in the equivalent suffix tree. Although the suffix array requires the longer 
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time of O(nlogn) to construct, it’s redeeming feature is that it requires less 
than half the space of the equivalent suffix tree. The reduction in storage space 
implies that querying the suffix array is also very efficient since fewer (external) 
memory references are required. 

In parallel computation, a PRAM algorithm was presented in ^ for the 
construction of a suffix tree requiring 0(n log n) work and O(logn) time. This 
algorithm also requires polynomial space; i.e. for 0 < e < 1. Drawing 
from the body of fundamental tools developed for parallel string algorithms, we 
present a practical PRAM algorithm for suffix array construction. By definition, 
the construction of the suffix array requires sorting techniques. We use the well 
known bulletin board technique to implement a deterministic naming scheme 
that also maintains lexicographical order between substrings. The bulletin board 
is frequently used in parallel string algorithms and our implementation shows 
how it can be used in practice. Indeed, for realisitic input sizes, the polynomial 
space requirement of the bulletin board soon exceeds that provided by main 
computer memory. In this case, we continue the suffix array construction using 
a parallel radix sort. Our algorithm resembles that of H2] in that it uses the 
doubling technique to derive logarithmic time bounds. However, we feel that a 
parallel implementation of the Manber and Myers algorithm is not a practical 
option for the MasPar architecture due to our use of external memory. In the 
sequel, the description of our algorithm is associated with its practical adaption 
for the massively parallel MasPar MP-2 2216. 

2 Preliminaries and the MasPar 

A string a; is a finite sequence x[l..n] of characters such that each x[i] is drawn 
from an alphabet S. The length of a; is n and is denoted |a;|. A substring of x is 
a string x[i..j] such that 1 < i < j < n. Furthermore, we say that the substring 
x[i..j] occurs at position i in a:. A prefix of a; is a substring a;[l..j] such that 
1 < J < n. A suffix of X is a substring x[i..n] such that 1 < i < n. 

Given an input text T = x[l..n] and an ordered alphabet S, we define the set 
of suffixes of T to be {si, S 2 , . . . s„} such that Si = x[i..n]. The suffix array of T 
is a dictionary or indexing data structure consisting of the following components: 

(i) An array SA of size n of integers in the range l..n representing the lexico- 
graphically sorted suffixes of T. 

(ii) Two further arrays, left-lcp and right-lcp, each of size n — 2 and con- 
taining integers in the range 0..n — 1. The arrays left-lcp and right-lcp 

are used to guide the search during the query procedure. 

Each integer value stored in the arrays in (ii) above is the length of the longest 
common prefix between a pre-determined pair of suffixes of T. These suffix pairs 
are exactly those equal to all the intervals that arise during a binary search in 
SA: i.e., let [j, k], 1 < j < k < n be any interval that arises during a binary search 
in SA. Furthermore let i = [{j + k) /2J be the midpoint of this interval. Then the 
value stored at left-lcp [j] is the longest common prefix between the suffixes 
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associated with SA[i] and SA[j]. Also right-lcp[i] is the longest common prefix 
between the suffixes associated with SA[*] and SA[fc]. 

A simple way to determine the supplementary information needed in (ii) 
above is to firstly compute an array LCP of length n— 1 that contains the length of 
the longest common prefix between the suffixes represented by SA[i] and SA[f + 1] 
for all 1 < i < n. Our implementation uses this approach. Consider the string 
T = abbaabaaababbb: then the array SA for T 

sa[14] = [7, 4, 8, 5, 9, 1, 11, 14, 6, 3, 10, 13, 2, 12] 

In this example sy is the lexicographically smallest suffix and Si 2 is the largest. 
The accompanying array LCP for SA is: 

lcp[13] = [2,4,1,3,2,3,0,1,3,2,1,2,2] 

An optimal algorithm for computing the minimum value in a given interval 
can be used to compute left-lcp and right-lcp from the array LCP. 



2.1 The MasPar 

The massively parallel idiom is one which gains performance through replication 
by linking thousands of simple processing elements (PE’s) via a suitable inter- 
connection network in a specific topology. In the MasPar MP-2, 16,384 PE’s are 
connected in a 128 x 128 two-dimensional mesh known as the PE Array. Each PE 
has 64K of local memory giving an aggregate 1 Gigabyte of distributed memory. 
In addition, PE’s can access a shared memory of size 512K. (See Pj for more 
details) Communications between PE’s are executed in one of two ways. The 
XNet interconnect furnishes fast local communications such that each PE can 
communicate with one of its 8 nearest neighbours (i.e., the adjacent horizontal, 
vertical and diagonal neighbours) in a register to register fashion. For more ar- 
bitrary communications there is a Global Router that is implemented using a 
multi-stage interconnection network. There is one originating router port and 
one target port per cluster of 16 PE’s. Router communication is constant or 
independent of the position of the communicating PE’s. We implement our al- 
gorithms using the MasPar Parallel Application Language, abbreviated to MPL, 
which allows the programmer specific control over data distribution and inter- 
processor communications, see ^ for details. 

3 Data Structures and Techniques 

3.1 Substring Naming 

When gathering string statistics for a string x the technique known as substring 
naming is commonly used to group all equal substrings of x together and to asso- 
ciate a unique integer with each group. Substring naming together with recursive 
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doubling (see Q) are techniques that are widely used in parallel dictionary com- 
putations (see In □ it was used in the parallel construction of suffix 

trees. 

More formally, let x\i..j] be a substring of an input text T of length ^ 
for some l<i<j<n.A deterministic substring naming function is a function 
that takes a substring such as x[i..j] as an argument and returns an integer value 
name. We use (.-name to be the name given to a substring of T of length i and 
(.-name{i) to denote the name of the substring of length i that starts at position 
i in T. The name for a string of length n can be computed in log n applications 
of the following naming function: /(name!, name2) = newname, where (namel, 
name2) is a tuple of integers, each of which is the name of a substring of length 
£. Furthermore, newname is the 2£-name of the substring of length 2£ that is 
formed by concatenating the substrings associated with namel and name2, i.e., 
the substrings £-name{i) and £-name{i -\- 2^), 1 < i < n. 

3.2 Naming Using a Bulletin Board 

A bulletin board BB[l..n] is a two-dimensional array data structure that 
is frequently used in CRCW PRAM string algorithms. It enables processors to 
update the value of a specific parallel variable in constant time as follows: all 
processors that wish to write to a location BB[r, c] attempt to do so. Depending 
on which CRCW model is used, a write conflict resolution mechanism is used to 
determine a winning processor from amongst them that will succeed in writing. 
For our algorithm, the naming function is computed by associating the rows 
(r) and columns (c) of bb with the namel and name2 values respectively. We 
then use bb to convert two values into one as required by the naming function 
as follows: for all processors associated with positions in T such that namel = 
r and name2 = c, the winning processor writes a 1 at location BB[r, c]. This 
location is now said to be active. A unique value is then associated with each 
active location by computing the prefix sums of all active locations. This is the 
value of the variable newname. All processors that attempted to write to BB[r, c] 
can now read the same value of newname from this location. For full details of 
the above implementation see |S|. 

The bulletin board is implemented as a set of two dimensional arrays distri- 
buted across the memory of the PE array. For a bulletin board of size nx n, and 
given a total of p processing elements, a simple distribution allocates processor 
Pi,i = 0..p — 1 to the bulletin board location [i div n -I- 1, * mod n -|- 1]. When 

> p this data structure is virtualised. Restricting our array bounds to powers 
of 2 we found that with 1 Gigabyte of PE memory we can implement a bulletin 
board of size 4096 x 4096, together with the associated longest common prefix 
data structures. Therefore each PE is associated with a sub-block consisting of 
32 X 32 = 1024 bb locations. 

Initially, the input alphabet S is distributed to all PE’s and the size of the 
first bulletin board required is jAI-l-l. We now remove all characters from S that 
are not contained in the input text T and assign names to those that remain, 
since they represent all substrings of length 1 in the text. The input T is read in 
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consecutive blocks consisting of p contiguous text characters from the external 
memory assigning one text position to each PE. Initially, it is possible to stack 
the text characters onto the PE’s in layers since the bulletin board is very small. 
All bulletin board updates for each stage of sorting therefore require 0(ji/p) 
external memory reads (and writes). 

3.3 Longest Common Prefix Computation 

Let lcp[z, denote the length of the longest common prefix between the sub- 
strings of the input text T starting at positions i and j of length k. For 1 < h < 
log |T| we have 



lcp[i,j]/i = 



r2'’-i + lcp[z + 2^-i,j + 2'*-i];,_i, 
llcp[z, 



incp[i,j]h-i = 2 ^ 1 

otherwise 



( 1 ) 



The value of the longest common prefix that exists between each suffix and its 
right neighbour (i.e., the suffix that is currently the next largest one) is updated 
at each stage of naming using the following arrays: 



1. a, q X q array NEW_lcp is used to store the values of the longest common 
prefix which exist between every pair of newnames computed in the current 
stage, (q is the largest name assigned in the current stage.) 

2. an w X w array OLD_lcp is used to store the values of the longest common 
prefix which exist between every pair of newnames computed in the previous 
stage, (w is the largest name assigned in the previous stage.) 



These arrays are distributed across PE memory and we exploit the architec- 
ture’s ability to efficiently broadcast values within the rows and columns of the 
PE array. 



4 Bulletin Board and LCP Table: Performance Results 

In Algorithm LCP below we describe how the LCP values are maintained. The 
algorithm takes as input the array OLD_lcp and it outputs the array NEW_lcp. 
An “active” processor is one which is associated with a bulletin board location 
that has been marked 1 (and was therefore written to) . Consider the computation 
to determine the longest common prefix that exists between two Cstrings at 
i and j in T. Let «i , /3i denote the values of namel and name2 associated 
with the substring at position i and 02,^2 denote the values of namel and 
name2 associated with the substring at position j. From (1) we can immediately 
derive the following lemma that is used to guide the longest common prefix 
computation: 

Lemma 1. Using the a, f3 representation of substrings, the common prefix bet- 
ween two substrings such that a\ yf 0.2 does not change from the previous 
iteration of naming. 




376 C.S. Iliopoulos and M. Korda 



Algorithm LCP 

STEP 1: The first row and first column, rg and Cg, of NEW_lcp are initialised 

as follows: each active processor writes its namel value to both ai of processor 
PS in row rg and to 02 of processor ps in column cg. The same processor then 
writes its name2 value to both /3i of processor ps in row rg and to P 2 of processor 
PS in column cg. 

STEP 2: All locations in NEW_lcp are now initialised as follows: processor ps of 
row rg broadcasts Oi and /3i to all processors in the same column, cs- Similarly, 
each processor ps of the column cg broadcasts «2 and P 2 to all the processors in 
the same row, rs- 

STEP 3: Each location in NEW_lcp has now received 4 values which are grouped 
into the two pairs: 7 = (oi, 02) and C = (/3ij /?2)- 

STEP 4: All locations in the table new_lcp now consider their 7 and ( pairs. If 
ai yf Q!2, then by fact (1) and LemmaEthe longest common prefix value does not 
change, else the longest common prefix value is equal to |ai|+ OLD_lcp[/ 9 i,/ 32]. 

Table ^shows how rapidly the size of the bulletin board grows for increasing 
lengths of English text and DNA. The results show that the bulletin board size 
for DNA and English text inputs grows very rapidly and that only 3 and 2 
iterations respectively of naming are possible for input sizes larger than 16384: 
suffixes are sorted according to a uniform prefix of length 8. Indeed, the largest 
bulletin boards that are made use of are of size 260^ for DNA and 539^ for 
English text. 



Table 1. Iterations = number of iterations of naming. Last BB = largest bb realised. 
Next BB = next bb size required, Secs = time in seconds 



DNA 


n 


Iterations 


Last BB 


Next BB 


Secs 


8k 


11 


260 


6818 


88.07 


16k 


3 


260 


12002 


3.41 


32k 


3 


260 


21946 


4.62 


64k 


3 


260 


34767 


7.83 


128k 


3 


260 


47664 


14.59 



English Text 


n 


Iterations 


Last BB 


Next BB 


Secs 


8k 


4 


4036 


4138 


31.52 


16k 


2 


450 


7222 


77.34 


32k 


2 


487 


12389 


3.09 


64k 


2 


507 


15150 


4.99 


128k 


2 


539 


22611 


9.09 



5 Radix Sort via Merge-Sort 

In Section 0 we showed that the bulletin board cannot be used to construct 
SA for large input sizes due to memory restrictions. When our memory limit is 
reached say at iteration j of renaming, the input text T can be represented by 
an unordered array of integers in the range l..g, where q is the largest newname 
computed in the last iteration of the naming stage. By sorting these integers. 
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(each tagged with the index of the suffix it represents), the suffix indices can now 
be rearranged into their current lexicographic order. In our implementation we 
do this using parallel merge-sort. In this way, all suffixes that share a common 
prefix of length at least t are placed into a partially completed suffix array in 
contiguous locations, PART_SA. Furthermore, each suffix can now be encoded by 
a new, shorter string as follows: let Si be a suffix of T. We define the g-string of 
Si to be (/ci, k 2 , ■ ■ - kt) such that 

(i) kj for some 1 < j < t is the name of the substring of length £ starting at 
position i + £{j — 1) in T. 

(ii) 0 < kj < q, for 1 < j <t. 

We call each kj a fc-component and use the value 0 to denote a fc-component 
that occurs beyond position n in the input. At this stage the construction of the 
array SA is completed using a radix sort, to base g -I- 1 and each round of this 
radix sort is a parallel merge-sort. 



Virtualised Sort for DNA, Periodic and Text 




Fig. 1. Merge-Sort applied to DNA, Highly periodic and Text strings 



5.1 Parallel Merge-Sort 

The underlying computational structure of the traditional PRAM algorithm for 
merge-sort is a complete binary tree. Our merge procedure is an adaption of the 
O(loglogn) merging of |0| and C3- To avoid the communication overheads that 
are incurred by their 0(1) parallel ranking procedure, we implement ranking 
using a simple binary search. This increases the (theoretical) time complexity 
for the merging to O(lognloglogn). We implement the binary tree computation 
using a 2 X n array SORT. The merge-sort procedure requires 0(log^ nloglogn) 
time, using n processors. For p < n processors this data structure is virtualised 
using a cut-and-stack data mapping as in jSj: no two contiguous input elements 
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reside on the same processor. Figure Q shows the time taken to perform this 
merge-sort for three different input types. 

From Figure ^we see a sudden jump in the running time for the DNA input 
of size 32k. This particular sample contains 29 occurrences of the substring 
TTTTTTTT , the other samples of DNA contain no more than 6 such substrings. 
In comparison, the input samples of text contain no such repetitive substrings 
of this length. Substring statistics in relation to suffix trees have been studied 
extensively by Szpankowski in M- 

5.2 Segmented Merge-Sort 

Scan primitives can be applied to vectors which are divided into contiguous 
blocks called segments. The segments are defined using Bags which mark the 
end-of-segment boundaries. If there are m such ffags in a given input vector V, 
the primitive will be executed m + 1 times in parallel, once for each segment in 
V. In our implementation we consider each segment as a bucket, the contents of 
which are the indices of suffixes of T that share a common prefix. The buckets 
are refined using segmented merge-sort until each contains only one suffix index. 
The segment boundaries are defined using the following rule: any virtual proces- 
sor that is associated with a different £-name to its right neighbour signifies the 
end of a segment. We achieve the effect of applying merge-sort to all segments 
in parallel by renumbering each (virtual) processor so that the first processor of 
each segment has number 0. For each virtual processor, its new processor num- 
ber newJproc depends on the position at which its segment starts and on its 
original (virtual) processor number. Algorithm RADIX below takes as input 
the following items: (i) an unordered array of names in the range l..q where q 
was the largest newname assigned in the last iteration, say iteration j, of rena- 
ming (each name represents the prefix of length 2^ of each suffix of T); (ii) the 
associated TCP values; (iii) the table NEW_lcp, also computed in iteration j of 
renaming. The output is the completely sorted suffix array, SA. 

Algorithm RADIX 

STEP 1. Each suffix Si is represented by its g-string q{i). Compute the partial 
suffix array part_SA as follows: apply the merge-sort procedure to the first k- 
component ki, of each g-string. Each g-string is relocated to its newly sorted 
position together with the index of the suffix which it represents and the asso- 
ciated LCP value. For j = l..[n/2^J do each of the following substeps: 

STEP 2a. Using component kj as a key, mark all locations t in part_SA such that 
the A:-component for PART_SA[t] is not equal to the fc-component for PART_SA[t-|- 
1]. This partitions the array part_SA into segments such that all locations with 
the same fc-component belong to the same segment or equivalence class. 

STEP 2b. Compute the size of each segment. If all segments are of size 1, then 
compute the LCP values for the entire input and terminate the computation. 
Else, for all locations contained in segments of size 1, store the value of kj in K, 
J — 1 in J and mark the location as “non-active” . 
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STEP 2c. For all “active” segments of size greater than 1, apply the merge-sort 
procedure using the component kj+i as the sort key. Remove kj from each q(i) 
and send the remaining q-string to its new sorted location. 

6 Conclusion 

We give an algorithm for the parallel construction of the suffix array data struc- 
ture. Our empirical results show that the PRAM bulletin board technique for 
naming and sorting substrings requires more main memory than is available on 
a MasPar MP-2 for practical input sizes. This is partly due to the fact that 
we also maintain tables for computing longest common prefix information. The 
theoretical time of 0(log^ n) “reduces” to 0{kn/p logn) where k < logn is the 
number of bb iterations that main memory can accomodate, and p is the number 
of PE’s. However, the bulletin board can be used to reduce the input size and to 
create a more succinct representation of each suffix, the g-string. Each character 
in a g-string represents a substring of length 2^ for some 0 < ^ < log n. For DNA 
and English text the length of g-strings is in the region of n/4. A radix sort is 
then used to complete the suffix array construction. Assuming that the main 
memory of the PE array holds n integer values, one round of the radix sorting is 
implemented using a segmented merge-sort and requires 0{n/p log^ m log log m) 
time, where m < n is the size of the largest segment or bucket to be sorted in 
the current round. The number of rounds of radix sort required depends on the 
length of the longest repeated substring in the input: for DNA and English text 
we found that at most 3 rounds were required to eliminate all but a small fraction 
of g-strings from the sort. 
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Abstract. An alternative proof (to that from 0) is presented for weak 
bisimilarity of transition systems to be abstractly definable by open maps 
0 . This result arises from an observation that the categories (of transi- 
tion systems) well suited for studying strong and weak bisimulations are 
related by an adjunction (for a suitable monad), giving a link between 
both bisimilarities. We formulate a generalization of this result, hopefully 
applicable also to other equivalences of processes. 



1 Introduction 

Recently a categorical generalization of bisimulation was proposed, by means of 
open maps ( open morphisms) [tiiS] . enabling a uniform definition of bisimulation- 
like equivalences across a range of different models for parallel computations. 
This setting turned out appropriate for defining, among many others, strong 
and weak bisimilarity trace equivalence 0, testing equivalence and a 
bisimilarity of event structures j^. Open maps can be understood as arrows 
witnessing a bisimulation, hence two objects A and R in a category are bisimilar 
if they are related by a span of open maps, representing abstractly a bisimulation. 

O (1) 



A B 

In PI one can find an overview of different equivalence definable by means of open 
maps. Open maps were also successfully applied to behavioural equivalences P 
of algebras, see m- 

In this paper we focus on weak bisimilarity of transition systems, proved al- 
ready in P] to coincide with open-maps bisimilarity in the category of transition 
systems. The method was similar as in the case of strong bisimilarity, that is 
open maps were characterized as those satisfying a suitable zig-zag condition, to 
be mentioned below. The only difference to strong bisimilarity is that category 
which turned appropriate to work in, was richer in morphisms than the usually 
considered category of transition systems (which is well suited for strong bisimi- 
larity) . We found an abstract categorical characterization of this category as the 

* This work was supported by the KBN grant 8 TllC 046 14. 
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Kleisli category for a suitable monad on the category of transition systems. Both 
categories under consideration are moreover linked by a pair of functors forming 
an adjunction, which we show to be reflective. These observations lead direc- 
tly to an alternative prove of coincidence of weak bisimilarity and open-maps 
bisimilarity, which does not require any explicit characterization of open maps. 
This is due to properties of open maps, proved recently in cni, which allow to 
’’transport” openness of morphisms (and bisimilarity) via an adjunction. 

On a more intuitive level, the adjunction gives an elegant connection bet- 
ween (abstract formulations of) strong and weak bisimilarity. There still rest an 
interesting question, whether this situation can be shifted to some other equi- 
valences of processes? This is why all the properties of preservation of openness 
and bisimilarity, formulated in this paper as well as in are intended to be 
re-usable and as general as possible. 

We assume the reader to have some prior knowledge of category theory, 
in particular to be familiar with adjunctions and monads. As a reference, we 
propose mi- 

2 Bisimulation from Open Maps 

Let U he a, category of models of computation, in which we choose a subcategory 
V (not necessarily full) of observation objects. V is also called in the sequel a 
path subcategory, as it is intended to contain ,, paths” for computations to follow. 
Any morphism p : O A from an observation object O G \V\ is understood as 
an observable computation in A. A morphism h : A B between models can 
be intuitively thought of as a simulation of A in i? since h transforms every 
computation p : O — > A in A to a computation p',h \ O ^ B in B. Moreover any 
morphism m : O ^ O' in V making p = m\p' means intuitively that a “larger” 
computation p' is an extension of p (via m) . 

In a definition below, we distinguish moreover a subcategory of models of 
interest M. j-lA, and consider bisimilarity only in M. . This way we gain a notion 
slightly more general than usually (when M. =U), following the approach of Pj. 
Path subcategory V is not required to be a subcategory of M. 

Definition 2.1 (Open maps and bisimilarity). A morphism : A — > B in 

A4 is V-open if for any morphism m \ O ^ O' in V and two computations 
p : O A and p' : O' B in lA, whenever the square 




commutes, i.e. p;h = m',p', there exists a diagonal morphism r : O' A in U 
making two triangles commute, i.e. p = m',r and p' = r; h. Two objects A and 
B are V -bisimilar, denoted by A ~-p B, if there exists in A1 a span of 7^-open 
maps as in (P). We omit prefix V- when obvious from a context. If not stated 
otherwise, we assume U = Ai. 
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The abstract notion of 7 ^-bisimilarity is intended to generalize the strong 
bisimilarity between transition systems, which will serve as an illustrating ex- 
ample. 



2.1 Strong Bisimulation 

A labelled transition system uni (over a set L, intended to give labels to transiti- 
ons) is a triple T = (S', i, { }aGi) consisting of a set of states S with a distin- 
guished initial state i G S and a family of transition relations _ — ^ _ C S x S. We 
write s — *■ s' to denote that states s and s' are related by a transition labelled 
by a. 

Morphisms between transition systems are ,, structure preserving” functions 
mapping states to states. Formally, a morphism from Ti = (Si,zi ,{^}aGL) 
to T2 = (S2,Z2,{ — ^ }aGi)j is a function a : Si ^ S2, such that a{ii) = 12 
and s s' implies (t(s) cr(s') (we deliberately overload here symbol , 
hoping that this causes no troubles). This defines a category TSl, in which 
morphisms compose as functions. 

Following H1I2I, we say that Ti and T2 are strongly bisimilar if there exists a 
strong bisimulation between them; bisimulation is defined as usual, with the only 
additional requirement to relate initial states. Strong bisimilarity was shown in 
0 to coincide with iJrariL-bisimilarity in TSl, where Branr is the full subca- 
tegory of transition systems consisting of finite sequences of actions: 



Si 



( 2 ) 



BrauL-open morphisms ct : Ti — > T2, called also zig-zag, are those satisfying the 
following zig-zag property: for each reachable s G S\, whenever a{s) — > s' , then 
s — ^ s", for some s" G S\ satisfying cr(s") = s'. 



3 Weak Bisimulation 

Consider transition systems over an alphabet L, fixed in the sequel. We assume L 
to contain a distinguished action r, being silent or non-observable. Weak bisimu- 
lation, being less restrictive than strong one, allows an action a to be simulated 
by a sequence of the form 




that is by a preceded or followed by an arbitrary number of r actions. Moreover, 
T action need not to be simulated at all. In the formal definition below we use 
the following notation, proposed in m- First, let s s' denote a sequence of 
transitions 6 , for an arbitrary label a (including r), that is 

s s' iff s(— ^) r — ^ F(— ^) s' for some states r and r’, 

where (— ^) stands for the reflexive-transitive closure of — ^ . Second, treating 
an action a as a one-element sequence, we define a r-deleting function C : L — > L* 
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by a =def a, for a r, and r =def e, the empty sequence. Moreover, we assume 
s — ^ s' s = s', implicitly extending the notation _ fofo _ to the set L* of finite 
sequences of actions. 

Definition 3.1. A weak bisimulation between Ti and T2 is any relation ~ C 
Si X S2 such that and whenever si~S2 then for every action a G L, 

1 . Si fofo s'l in Ti implies S2 s'2 in T2 and s'i~S2, for some S2, 

2 . S2 s'2 in T2 implies si s'l in Ti and s'i~S2, for some s'l. 

If such ^ exists, we say that Ti and T2 are weakly bisimilar. 

To deal with weak bisimilarity we choose the appropriate category WTSl 
of transition systems whose morphisms cr : Ti — > T2 are functions a : Si S2, 
such that cr(ii) = *2 and s s' implies (t(s) c(s'). These morphisms will 

be called weak morphisms; the usual morphisms of transition systems we call 
strong in the sequel. Obviously category TSl, consisting of strong morphisms, 
is a subcategory of WTSl- 

As a path subcategory for weak bisimulation we take again the subcate- 
gory Branr of finite linear transition systems as in Q together with exclusively 
strong morphisms between them. Surprisingly, this is the same path subcategory 
as in the case of strong bisimilarity - this coincidence will be justified and ex- 
plained below. Obviously Branr is not a full subcategory of WTSl, being full 
subcategory oi TSl- In PI it was shown that weak bisimilarity coincides with 
Br an L-hisiToilaxity in category of weak morphisms; it was achieved by explicit 
characterization of BrauL-open maps as those satisfying an appropriate zig-zag 
property, analogously as in the case of strong bisimilarity. 

4 Characterization 

Our aim is to show that spans of BranL-open morphisms in category WTSl 
define weak bisimilarity without referring to any explicit formulation of openness. 
Our considerations will be based on the observation that bisimilarity induced by 
open morphisms can be transported in some relevant cases via an adjunction. 
First, open maps are well-behaved with respect to an adjunction: 

Lemma 4.1 (|tlU|). For arbitrary adjunction F ~\ G between categories M. and 
JV and for arbitrary subcategory V of M., 

a morphism h in N is F(fP)-open 4 =^ G{h) is V-open in A4- 

Second, despite that bisimilarity itself is not transported in a similar way in 
general, it does when an adjunction is a reflection: 

Lemma 4.2. Assumed that the adjunction from the previous lemma is reflective 
(i.e- the right adjoint G \ M ^ M. is full and faithful), we have 

A ~F(p) B in category Af 4 =^ G{A) G{B) in category G{J\f) 

(G{Af), image ofG, is a subcategory of AA). 
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Proof: Assumed that A and B are _F(P) -bisimilar in A/”, that is they are related 

by a span A C — ^ B of F(P)-open maps in M, we can easily construct a 
span of P-open maps by taking the image of G on C, / and g. By Tjemma fd. II 

both morphisms forming the span G{A) G(G) G{B) are P-open, hence 
G{A) and G{B) are P-bisimilar. 

For the opposite direction, let A and B be arbitrary objects of Af such that 
G{A) and G{B) are P-bisimilar in G{Af). This means that G{A) and G{B) 

are related by a span G{A) G G{B) of P-open maps and moreover 
there exist some morphisms in Af, say /' and g' , which are mapped by G to / 
and g, respectively. Let G' and D' denote domains of /' and g' , respectively. 
Since G(G') = G{D') and G, being full and faithful, reflects isomorphisms, we 

conclude G' ~ D' and obtain a span A J— G' ~ D' B. Moreover, /' and g' 
are F"(P)-open, again by T;emma l4.i I hence A ^piv) B- 

A careful reader could have noticed that for P-bisimilarity in category G{Af), 
the path subcategory V is not guaranteed to be a subcategory of G{N). This 
motivates our general definition of bisimilarity in Section El 

A fact similar to Lemma was proved already in 0, i^i the situation when 
the adjunction is a coreflection. For other related results one can consult 0 and 
0, where conditions are given for a functor to preserve openness. 

We are going now to construct an adjunction between categories TSp and 
WTSl, to which we will apply Lemma ^21 The adjunction will be obtained 
automatically by noticing that WTSp is precisely the Kleisli category for the 
monad of a suitable endofunctor W : TSp -^TSp, defined below. Intuitively 
speaking, for a transition system T, W{T) has the same states but more transi- 
tions; these extra ones are all of the form s s' . Hence W can be thought of as 
a closure on all ’’weak transitions” s s' . Formally, for T = {S, b{^}a6L) 
we take: 



W(T) = (5,*,{^ Wl). 

For a morphism / : W’(Ti) ^ W{T 2 ) we put: 

W(/) = / : W(Ti) ^ W(T2) 

i.e. W takes / to the same function. It can be easily checked that functor W 
equipped with two natural transformations g : MtSl hV and g : -L W, 

consisting of identity functions 

?7t(s) = s, /tt(s) = s 

form a monad, i.e. satisfy the monad laws (cf. 133). Now consider 

the Kleisli category for this monad, having the same objects as TSp and whose 
morphisms Ti ^ T 2 are all morphisms T\ —>■ Vf(T 2 ) from TSp. We obtain for 
free the canonical adjunction between TSp and the Kleisli category; moreover 
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Lemma 4.3. This adjunction is a reflection. 

Proof: We will show that counit e is a natural isomorphism, that is T ~ W{T) in 
the Kleisli category. To this aim we will show that the following two morphisms 

I := t]t] r]w(T) ■ T — *■ W(T) and r := Et = *dyv(T) : W(T) T 

compose to identities in the Kleisli category. (Codomains of I and r above are in 
the Kleisli category; note that these morphisms are also morphisms in TSl, but 
here they have other codomain objects: I : T ^ W^{T) and r : W(T) W{T).) 
Start by observing that 

VwiT) = yV{VT)- (4) 

Having this and applying monad laws, it is easy to show that (composing in 
TSl) 



l-,W{r)-,HT = VT\Vw(Tfh-T = VT and 



r; W(0; A*w(T) = *'^w(r); bV(r7T); W(77 w(t)); Mw(t) = W(t7t) = ?7w(t) 

i.e. both I and r compose in both cases to identities in the Kleisli category, which 
are as usual units of the monad. □ 

Finally, one can see that the Kleisli category we are considering is just another 
formulation of category WTSl consisting of weak morphisms. Now, applying 
Lemma 14. 2l to the adjunction between TSl and WTSl we obtain the following: 



Proposition 4.4. Weak hisimilaritu coincides with Branr-hisimilaritv in cate- 
gory WTSl- 

Proof: Let Fyy H Gw denote the functors of the adjunction, with the left adjoint 
Fw ■ TSl WTSl- Observe that Fw{f) = /, hence Fw(BranL) = BrauL- 
Now, by T;emma l4.2l two transition systems T and U are connected by a span of 
BraUL-open maps in WTSl if and only if 

W{T) and W(U) are related by a span of Hran^-open maps in Gw{WTSl). 

(5) 

On the other hand, notice that weak bisimulation could be defined equivalently 
(cf. H2|) by replacing in Definition 13. 1 1 requirements ITI a.nd O bv 

1. Si s'l in Ti implies S 2 s '2 in T 2 and Si~S 2 , for some S 2 , 

2. analogously, 

which means that T and U are weakly bisimilar if and only if W{T) and W{U) 
are strongly bisimilar (more precisely, weak bisimulations between T and U are 
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precisely strong bisimulations between W{T) and W{U)). Hence T and U are 
weakly bisimilar if and only if 

W{T) and W(U) are related by a span of Hrarii-open maps in TSl- (6) 

Now we only need to fill the gap between Q and (Q, different only in the 
category, the spans of open maps are to come from; it suffices to show (0 
0. Consider any Brarii-open morphism / : C — > W(T) in TSl- Note that it 
is necessarily its own adjunct = f : V ^ T in WTSl- We will show that 
Gw(r^) = bV(/);^T is also BrauL-open. First, counit p-v is open, being an 
identity function. Moreover, W(/) : yV{V) W^{T) is also open (i.e. zig-zag), 
since zig-zag property from Section EH can be easily proved to hold also for 
sequences of the form 0 . Hence, if W{T) and W{U) are connected by a span 
of Br auL-open maps from V, then they are also connected by a span of BrauL- 
open maps from yV{V), laying necessarily in Gw{WTSl)- n 

We would like to stress on that in the proof we did not need refer to any explicit 
characterization of BrauL-open maps in WTSl, despite that such a characte- 
rization exists and is given by a zig-zag condition similar to that in Section \Z. 1 1 
(cf. 0 ): a morphism a : (Si,ii,ti) {S2, 12,^2) is BraUL-opea in WTSl iff for 

each reachable s G Si, whenever a(s) — ^ s', then s s", for some s" G Si 
satisfying cr(s") = s'. Moreover, it is an interesting observation, that the choice 
of BrauL for the path subcategory for weak bisimulation seems to be the only 
reasonable one. For instance, one can easily check that we obtain a different 
equivalence when we replace BrauL by the full subcategory oIWTSl of finite 
linear transition systems. Surprisingly, bisimilarity induced in such a case would 
even not relate the following two weakly bisimilar transition systems 

T 

• • >■ • 

one of them consisting exclusively of a single initial state. 

5 Generalization 

Let (W, 77,/r) be a monad on some category M. Let TVfyy denote its Kleisli 
category and Fyy H Gvv denote the usual adjunction between M. and Atw. 

Motivated by the example of weak bisimilarity and by its equivalent for- 
mulation in points E and Q in the proof of Proposition 14.41 and especially by 
observation 0, we propose the following general definition: 

Definition 5.1. Objects A and B of A 4 are V -bisimilar w.r.t. W, for a subca- 
tegory 7 ^ of At, if W{A) and W{B) are 7 ^-bisimilar (in At). 

Before we state a generalization of Proposition 14.41 iTheorem lO.'Zll . let us 
analyze properties satisfied by W in the previous section. First, the monad is 
idempotent in the sense that /r is a natural isomorphism there. This is sufficient 
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in a general situation for Q, hence implies Gw to be full and faithful. Second, 
open maps are preserved by W, i.e. whenever / is open, W(/) is open as well. 
(This fact is implicitly used for proving Proposition ^31) Fortunately, these two 
properties guarantee that P-bisimilarity w.r.t. W is definable by means of open 
maps: 

Theorem 5.2. V -bisimilarity w.r.t. W coincides with Fw(V) -bisimilarity in 
Kleisli category when monad {W,r],yi) is idempotent and W preserves 

V -openness. 
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Abstract. The HIDRA Concurrency Control (HCC) mechanism provi- 
des support for concurrency control in environments where the coordi- 
nator-cohort replication model is being used. This replication model al- 
lows the arrival of multiple invocations to different object replicas which 
serve locally those invocations and later make the appropriate check- 
points on the rest of replicas. The HCC uses a service serialiser object 
(SS) and a set of serialiser agents placed in each replica node. As a result, 
since the HCC components are replicated, this mechanism is also fault 
tolerant. Each invocation received by an object replica is processed by 
the SS which knows the invocations that are currently being processed. 
So, this agent is able to block or allow the execution of arriving invoca- 
tions according to their conflicts with the currently active ones and the 
concurrency specification made when the object interface was declared. 



1 Introduction 

There are a lot of mechanisms to ensure synchronisation in object-oriented dis- 
tributed environments. Some of them are based on synchronisation primitives, 
like distributed locks with two phase locking p], on mutual exclusion algorithms 
which use their own protocols , or use programming languages with operation- 
based synchronisation support [E|. However these mechanisms either require a 
big amount of messages to find out which task may access the object or they 
do not have good expressive power to allow multiple synchronisation policies. 
The situation is worst if we consider a replicated resource whose access has to be 
synchronised. In this case, the typical solution relies either on two phase locking, 
which is very restrictive because all locks have to be gotten before the first one 
is released (and this removes the advantages of locks compared to an operation- 
based granularity mechanism, such as those provided in several programming 
languages), on dynamic voting [71, which implies a read- write locking mecha- 
nism, or on optimistic approaches jS|, which may lead to abortion of requests. 

The HIDRA ^ Concurrency Control (HCC) mechanism synchronises the 
accesses to replicated objects using an operation-based granularity. Since HIDRA 

* This work was partially supported by the CICYT (Comision Interministerial de 
Ciencia y Tecnologfa) under project TIC96-0729. 
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uses an object request broker (ORB) to manage remote object invocations, HOC 
is based on extensions to the CORBA m interface definition language (IDL). So, 
it is independent from the programming language, and the programmer has to 
deal with synchronisation features only when the interface is being defined. When 
the objects are being implemented no care has to be taken about synchronisation. 

HOC has been included in HIDRA because our architecture provides support 
for object replication. Thus, our concurrency control mechanism uses a special 
object that serves serialisation requests. These requests are made before the ac- 
tual replicated object is invoked and their execution thread is associated to that 
replicated object invocation. The serialiser checks if there are any invocations 
(either blocked or currently being executed) that have a conflict with the invo- 
cation being serialised. In this case, this invocation is blocked; otherwise, it is 
allowed to go on. 

The rest of the paper is organised as follows. Section El describes the HCC 
mechanism. Section 0 shows some synchronisation techniques used in other dis- 
tributed environments and finally. Sect. Elgives the conclusion. 



2 The HCC Mechanism 

The HCC mechanism is needed in HIDRA to serialise all requests that arrive 
to replicas of an object that uses the coordinator-cohort replication model. In 
this replication model, an invocation is initially served by only one replica, that 
processes the request and makes at least one checkpoint to transfer the state 
updates to the other object replicas. Each object invocation may be served by 
a different replica. So, multiple invocations may be executed concurrently in all 
replicas of the object and some distributed concurrency control mechanism is 
needed. 

To decide which operations may proceed simultaneously, an extension of the 
IDL language is used, providing information about which pairs of operations 
are mutually confiictive. Basing the concurrency on this property allows the 
implementation of multiple synchronisation strategies, such as mutual exclusion, 
readers- writer policy, FCFS policy, etc. 



2.1 Objectives 

As previously stated, HCC is a concurrency control mechanism that manages 
replicated object invocations in distributed environments. Its main objectives 
are: 



— The mechanism has to use a pessimistic approach. The object invocation 
mechanism used in HIDRA assumes that an invocation will be never aborted; 
this prevents the use of optimistic techniques to manage concurrency. 

— The mechanism has to be fault-tolerant; i.e., the failure of part of the com- 
ponents needed by our mechanism has to be tolerated. 
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<op_dcl> [ <op_scope> ] [ <op_attribute> ] <op_type_spec> <identif ier> 

<param_dcls> [ <raises_expr> ] [ <context_expr> ] 

[ <cnc_expr ] [ <cfl_expr> ] 



<op_scope> 
<cnc_expr> 
<cf l_expr> 



"local" 

"concurrent" "(" <scoped_name> { <scoped_name> }* ")" 
"conflicts" "(" <scoped_name> { <scoped_name> }♦ ")" 



Fig. 1. Syntax of the extended operation declaration. 



— We have to reduce the possibility of misuse of the mechanism by the pro- 
grammer. So, the management of the concurrency control tasks has to be as 
transparent as possible to the programmer. 

— Efficiency. The number of messages needed to carry on the concurrency con- 
trol tasks has to be kept at a minimum. 



2.2 Extensions to IDL 

The IDL extensions enlarge the optional parts of an operation declaration to in- 
clude which other operations of each interface instance can be executed concur- 
rently and which operations in different objects cannot proceed simultaneously. 

The HCC mechanism assumes initially that all operations of the same object 
are mutually exclusive. All other operation invocations can proceed concurrently. 
As a result, all non-extended interfaces are interpreted by the HCC as specifica- 
tions of objects whose state is protected by exclusive operations. 

The new syntax for an operation declaration appears in Fig. ^ where the 
new local, concurrent and conflicts clauses are shown. 

The local keyword means that this operation only has to access one of the 
object replicas. Thus, other replicas do not have to wait for a checkpoint that 
notifies the termination of that call. 

The concurrent expression gives the list of operations (that by default are 
in conflict with the operation being declared now) which can proceed simulta- 
neously with this operation. For each pair of concurrent operations, this expres- 
sion only needs appear in the declaration of one of them. 

IDL allows interface inheritance. The HCC considers that all operations of 
all the interfaces of an object cannot be executed concurrently. So, to build the 
list of concurrent operations we need scoped names because we have to identify 
the interface which provides the concurrent operation (it may be any ancestor 
interface in the hierarchy of interfaces provided by the object). 

The conflicts expression gives the list of operations (that by default are 
allowed to proceed concurrently) which now are in conflict with this operation. 
In this case, it is assumed that the operations in conflict are provided by two 
different objects, but these objects share some state and these operations ac- 
cess this shared state. Again, we need scoped names to identify correctly the 
operations in conflict. 
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Struct InvoCtxt { 



interface ServiceSerialiser ■( If pseudo IDL 



CORBA: :TypeId Interface, 
Qbjectid Qbjid 



long 



Operation, 



void SerialiseC in RoilD 

in InvoCtxt 



InvoID , 

Invocation, 
TerminationObject ) ; 



in TQbj 



}; 



}; 



Fig. 2. Interface of the service serialiser object. 



2.3 HCC Components 

The HCC mechanism relies on some components that maintain and manage 
the information needed to allow or suspend invocations on the objects to be 
controlled. These components are: 

— Serialiser object (SS). This object is created when a service (a group of 
inter-related objects) is registered in the system and it has to decide which 
invocations on the replicas of the objects that compose that service may 
proceed. 

As the requests arrive to the object replicas, the ORB invokes the serialiser 
providing information about which object instance is being invoked, which 
operation and which invocation identifier is being used. The serialiser checks 
if the incoming invocation conflicts with any one of the active invocations 
and, if so, blocks the incoming one. As a result, each serialiser has to maintain 
the identifiers of a collection of active (and still non-terminated) invocations 
and also, the identifiers and execution threads of all blocked invocations. 
These are the blocked and active lists, and they constitute the dynamic state 
of the serialiser. 

— ORB machinery on the server side. Before an invocation reaches the actual 
object it is calling, the ORB components placed on the server side have to 
identify and call the appropriate SS. When the call to the serialiser returns, 
the ORB machinery can invoke the actual object replica. 

2.4 Serialisation of Requests 

The HCC is managed by the HIDRA’s ORB components, becoming a transpa- 
rent service for the application programmer. Only a requirement is made to the 
programmer of replicated objects: she or he has to specify in the interface decla- 
ration which operations are incompatible, as we have described in Sect. 12.21 Our 
extended interface compiler generates the CCS object that has to be provided 
when a replicated service is registered in a running HIDRA system. As a result 
of this registration, the service serialiser (SS) is created and it receives the CCS 
object that it uses to make the concurrency control decisions. 

The serialisation of a request is made when that invocation arrives to the 
domain where the replica of the invoked object resides. The ORB components 
call the Serialise 0 operation of the SS. The declaration of this operation is 
given in Fig. 0 The arguments needed by this SS operation are the following: 
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interface CCS { // pseudo IDL 
boolean CanBeConcurrent ( 

in InvoCtxt Firstinvocation, 

in InvoCtxt Secondinvocation 

) raises (Unknowninterf ace , BadOperationNumber) ; 

}; 



Fig. 3. Interface of the CCS objects. 



— InvoID. A reference to a RoilD object cni that identifies the current invoca- 
tion being serialised. We use the acronym ROI (Reliable Object Invocation) 
to refer to an invocation on a replicated object. 

— Invocation. This structure maintains an invocation context and is composed 
by the following objects: 

• Interface. This value identifies the interface that is being invoked. 

• ObjID. This value is internal to the ORB and identifies the specific in- 
stance that is being invoked. 

• Operation. The operation number that is being invoked in the interface 
Interface. 

— TerminationObject. An object needed to detect when this invocation has 
terminated in all object replicas. See nm for details on this object. 

The information maintained in an invocation context is needed to identify 
the possible conflicts with other previous ROIs. 

Once the call to the Serialise () operation arrives to the SS, it follows these 
steps: 

1. All invocation contexts in the active and blocked lists are inspected and a 
call to the CanBeConcurrent () operation of the CCS object is made (See 
Fig.0) to test if the current invocation and the inspected one can proceed 
at the same time. 

2. In case that the two tested operations could not be concurrent, the identifier 
of the operation (its RoilD reference) in the active or blocked lists is inserted 
in a set of precedent operations associated to the current one. 

3. When the two lists have been scanned, if the precedent operations set is 
empty, this operation is inserted in the list of active operations and its 
Serialise 0 invocation is replied. However, if the precedent operations set 
is not empty, the operation is inserted in the list of blocked operations. It 
will remain there until all the operations in its precedent set have been ter- 
minated. When this happens, the invocation context is moved to the active 
list and the Serialise () invocation is also replied. 

4. The SS uses the TerminationObj ect associated to each ROI to find out when 
that invocation has been finished. That happens when this object receives 
the unreferenced notification, as it is described in cni- In this case, its RoilD 
is removed from the active list and from all precedent sets where it can be 
found. 
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2.5 Expressive Power 

In jO] , the expressive power is defined as the ability of a synchronisation mecha- 
nism to implement a range of synchronisation policies. The wider the range of 
synchronisation policies a mechanism can implement, the greater its expressive 
power will be. 

Bloom |3j gave some criteria to identify the expressive power of a particu- 
lar synchronisation mechanism. He proposed that six different types of data are 
necessary to give a good expressive power. These types are: the name of the 
invoked operation, the relative arrival type of invocations, the invocation para- 
meters, the synchronisation state of the resource, the local state of that resource 
and history information about ROIs already terminated. 

The HCC is able to manage four of these six available types of information. 
It uses the names (in our case they are given by the Typeld of the interface 
and the operation number) of the operations being invoked, the relative arrival 
type of invocations (as they are received, the precedent set is built and thus, the 
relative arrival time is maintained), the synchronisation state (because the HCC 
maintains which invocations are active and which others are already serialised 
but they still have not been started) and it is also able to maintain the history 
of past invocations on each replicated object. 

With all that information, the HCC can implement different synchronisation 
policies very easily. For instance, two of the most common synchronisation po- 
licies are mutual exclusion and readers/writer. To implement mutual exclusion 
no special action has to be taken in HCC, because it is the default policy. So, for 
the interface given in Fig. 0a, all operations are considered mutually exclusive 
and their invocations are serialised in FCFS order. 



interface BoundedBuf f er { interface BoundedBuf f er { 



void 


InsertItemCin Item Theltem) ; 


void 


Insertltem(in Item Theltem) ; 


Item 


GetltemO ; 


Item 


GetltemO ; 


void 


PrintBufferO ; 


void 


PrintBufferO 


Item 


ListItemCin long Position) ; 


Item 


concurrent (BoundedBuf fer : : PrintBuf f er) ; 
Listltem(in long Position) 


void 


Printitems (in long First, 
in long Last) ; 


void 


concurrent (BoundedBuf fer : : Listitem , 

BoundedBuf fer : : PrintBuf fer) ; 
Printitems (in long First, in long Last) 



concurrent (BoundedBuf fer : :PrintBuffer, 
BoundedBuf fer : :PrintItems, 
BoundedBuff er : :ListItem) ; 

}; 

(a) (b) 



Fig. 4. Example of interface declaration with: (a) mutual exclusion policy, (b) readers- 
writer policy. 



The first two operations modify the state of the buffer, while the other three 
only read this state. So, we can modify the previous declaration to enforce a 
readers/ writer policy. The resulting declaration is shown in Fig0b. In this case, 
the operations InsertItemO and GetltemO cannot be executed concurrently 
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with any other operation of the same interface because they modify the state 
of the bounded buffer. On the other hand, PrintBufferO, ListltemO and 
PrintItemO only read the state of the object and can be executed concurrently. 

2.6 Fault Tolerance 

To achieve fault tolerance, a representative of the SS is added in each replica 
node. This SS agent (or SSA, for short) maintains part of the dynamic state of 
the SS, enabling its reconstruction in case of failure. When SSAs are considered, 
the synchronisation tasks are modified in the following way: 

— All serialisation requests are initially managed by the SSA placed in the 
node of the coordinator replica. These SSA objects forward the serialisation 
request to the unique SS object, except for the case of an invocation to a 
local operation (See Sect, \1.2l . The SS does not suspend the execution 
thread in case of conflicts. It only takes account of this situation and replies 
immediately, returning the list of precedent invocation contexts. 

— The SS maintains in its blocked and active lists, information about all the 
non-local invocations. The SSA maintains in these lists only the information 
regarding the ROIs with their coordinator in its local node. It also blocks 
the execution threads associated to the ROIs placed in its blocked list. 

— The SSA extends its pseudo-interface to provide locally invocable operations 
to get and manage references to the TObj objects associated to the ROIs 
which have a cohort in its node. 

These operations are needed by the SSA to know when a given invocation has 
finished. When that event happens, the ROI is removed from the precedent 
sets associated to blocked ROIs and it is also removed from the active list. 

In case of failure of the service serialiser or several of its agents, some special 
actions are needed to reconfigure the state of the HOC components. According 
to the failure type, two cases are distinguished. 



Failure of a SSA. When a SSA crashes, the whole node where it resides has 
crashed, because our ORB support is in the kernel domain. So, all coordinator 
replicas for the ROIs controlled by this SSA have also crashed. 

We need to replace the faulty SSA because it controls the activation of the 
blocked ROIs when its precedent operations set becomes empty. To this end, we 
have to describe how a ROI is restarted when its coordinator replica has crashed. 

If the ROI still remained blocked, no special action has to be taken. If the 
client that initiated the ROI is alive, it will reinitiate the invocation on another 
replica. Since the RoilD is maintained by the client, the new attempt is identified 
as a replay by the SS and an updated precedent operations set is returned to 
the new chosen coordinator’s SSA. 

If the ROI was already active, our ORB support will choose another coordi- 
nator replica and no serialisation request is initiated to do so. When the client 
reinitiates the invocation on another coordinator replica to pick the results of 
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the previous attempt, its serialisation request will be replied immediately by the 
SS as in the case described in the previous paragraph. 

Failure of the SS and some SSAs. As previously shown, when this failure 
happens no special action has to be taken to rebuild the state of the crashed 
SSAs because the ROI mechanism chooses another coordinator replica for all 
ROIs involved in the crash. 

However, the dynamic state of the SS has to be rebuilt. This state consists 
of the active and blocked lists of ROIs. Both lists have to be rebuilt using the 
information maintained by the surviving SSAs. In these SSAs, we can find: 

— All the information about the blocked ROIs whose coordinator replica is 
placed on the same node. This includes the RoilD and InvoCtxts of all the 
precedent operations in each precedent set and the RoilD, InvoCtxt and 
TObj of the blocked ROI. 

— The RoilD and TObj references for the currently active ROIs that have made 
at least one checkpoint and still have not made the last checkpoint. 

Thus, when the SS has crashed, one of the remaining SSAs is promoted 
to the SS class. To rebuild its active list, the following steps are taken in the 
reconfiguration phase of the cluster: 

1. All surviving SSAs are queried and each of them returns a list with all 
RoilDs that have an associated TObj reference. For each one of these ROIs 
the SSAs return its RoilD, its TObj reference and (if it can be locally found) 
its InvoCtxt. 

2. All the RoilDs returned in the previous step are inserted in the active list and 
a TObj object replica is regenerated from its reference and it is associated to 
its RoilD and InvoCtxt. 

To rebuild the blocked list, this sequence of steps is needed: 

1. All surviving SSAs are queried and each of them returns all their blocked 
ROIs and the precedent set for each of these ROIs. 

2. All these blocked lists are merged to build the blocked list of the new SS. 
Thus, the precedent sets for a given ROI are compared and the resulting 
precedent set only has the ROIs that could be found in all the merged pre- 
cedent sets (we assume that if in any precedent set a given ROI is missing, 
then this ROI was detected as terminated by that SSA, which removed it 
from that precedent set). 

3. Finally, all precedent sets are checked to find out if they have some ROI that 
does not appear in the active nor in the blocked list. If that happens, that 
ROI is removed from the precedent sets because it corresponds to a ROI 
that was active but still did not make any checkpoint and whose coordinator 
replica crashed. A ROI of this class has to be reinitiated and serialised again. 
When some ROI of this kind is found, the new SS also has to invoke the 
Terminated 0 method of all the SSAs using its associated RoilD as input 
argument. This call removes the ROI from the active lists of all SSAs. 
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Once these two protocols have been executed, the new SS has a dynamic 
state that allows the service of new serialisation requests of the HOC. 

3 Related Work 

HIDRA needs a pessimistic concurrency control mechanism to synchronise the 
access to replicated objects that follow the coordinator-cohort replication model. 
The HOC provides such a mechanism with operation granularity. 

Other concurrency control mechanisms for replicated objects exist, but the 
greater part of them are used in databases and are based on quorum consensus 
0. These replication models assign a vote to each replica and divide the opera- 
tions in only two categories (read and write). Each time a read (write) operation 
must be made, the operation has to access a read (resp. write) quorum number 
of replicas. The property that has to be accomplished by these algorithms is that 
the sum of the two quorums must exceed the total sum of votes and that the 
write quorum must be greater than half the sum of all votes. The operations are 
allowed to proceed if they have collected the required vote quorum. 

More advanced techniques are discussed in 0 where two approaches are de- 
scribed: conflict-based and state-based validation. In the first case, operations are 
allowed to proceed concurrently if they commute; i.e., if they do not conflict. This 
is an approach equivalent to ours. The state-based validation needs know which 
parts of the state are affected by each invocation. In this case, the concurrency 
control mechanism needs the value of the arguments of each invocation and the 
current state of each object being invoked. Although this technique allows even 
greater concurrency than the conflict-based one, the amount of information that 
needs to be managed and the access to the object state make it infeasible in our 
environment. 

Finally, the replication model also affects the concurrency control mechanism. 
In the passive and active replication models only a local concurrency control me- 
chanism is needed. However, the coordinator-cohort replication model is not so 
easy. A concurrency control mechanism for this replication model was already 
given in PI . It is based on controlling the data dependencies and precedence de- 
pendencies between the operations being requested. A data dependency exists 
between two operations when one of them requires the result of the other be- 
fore it can be started. Precedence dependencies exist between two operations if 
they conflict. Although precedence dependencies are already controlled by HCC, 
data dependencies need some control on the arguments of the operations. This 
enlarges the amount of data that must be managed by the concurrency control 
mechanism and does not improve so much the concurrency. 



4 Conclusions 

The HCC mechanism provides an easy-to-use concurrency control support for 
the programmer of replicated objects. The programmer only has to worry about 
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concurrency when the interface of the replicated objects is being declared; all 
other support is transparently provided by HCC. 

The concurrency control is given at an operation granularity and it allows 
the implementation of multiple concurrency control policies. Additionally, the 
objects involved in the HCC support are fault-tolerant, giving as result an ap- 
propriate concurrency control mechanism for the coordinator-cohort replication 
model of HIDRA. 

Although other concurrency control mechanisms may be found for replicated 
object management, HCC is either more comfortable for the programmer or 
requires less message interchange among the agents involved in that concurrency 
control or provides support for a greater number of synchronisation policies. 
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Abstract. Knowledge discovery in databases (KDD) and data mining became 
very important in data processing and analysing. The combination of powerful 
KDD and data mining techniques and sophisticated resources of object-oriented 
database systems brings even more considerable results. But these emerging 
tools and techniques require a powerful data mining query language which 
would serve as an interface between applications and data mining tools. This 
motivates us to propose general conditions and instruments for extending 
object-oriented query language (OOQL) with the ability of data mining. These 
instruments will be introduced in two examples of an object-oriented data 
mining query language, ODAMIL, an extension of Object Comprehensions 
language, and DMOQL, an extension of OQL language proposed by the 
ODMG. 



1 Introduction 

Knowledge Discovery in Databases (KDD, [1]) is general process of discovery of 
useful knowledge from data. This process involves data pre-processing, data mining 
itself and interpretation of mined patterns. Patterns interpretation is necessary for 
distinguishing what patterns constitute knowledge and what don’t. Data mining is 
only one part of the KDD process. It analyses pre-processed data and produces 
information patterns which are then interpreted. 

Development of modern database systems, e.g. object-oriented databases (OODB), 
has advanced considerably. It is natural then to investigate knowledge discovery in 
OODB ([2]). The OODB offers richer structure and semantics which can be employed 
in the KDD process. 

In the present time there is a lot of practical applications employing or based on the 
KDD technology. Just the amount of various applications with various requests for 
the KDD system requires the introduction of certain standard which could be called 
Data Mining Query Language (DMQL). The DMQL would offer standard interface 
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between application and the KDD system. Design of such language for data mining in 
relational databases is described in [3]. 

The joining of modern object-oriented database technology and data mining in 
conjunction with the need to create query language for data mining motivates us to 
propose general conditions and instruments for extending object-oriented query 
language (OOQL) with the ability of data mining. This would serve for development 
of OOQL for data mining which enables an application to obtain knowledge from 
OODB in certain standard simple way similar to receiving stored data from OODB 
using OOQL. 

These instruments will be described and then introduced in two examples: object- 
oriented data mining query languages ODAMIL and DMOQL. The language 
ODAMIL (Object-oriented DAta Mining query Language) is an extension of OOQL 
Object Comprehensions ([4]). DMOQL (Data Mining OQL) is an extension of OQL 
language ([5, 6]) proposed by the ODMG group. 

The second chapter deals with data mining extension emphasising three basic parts. 
The first one is data mining input, i.e. OODB being used. The second one describes 
the data mining output, i.e. particular types of mined rules. And the third one 
necessary extensions added to the OOQL. The third and fourth chapter introduces two 
examples of extending OOQL with data mining capabilities: languages DMOQL and 
ODAMIL. And finally the fifth chapter introduces some examples of the DMOQL 
and ODAMIL languages usage. 



2 Data Mining Extension 

Looking at data processing we can see that it is an process taking a database as both 
input and output, i.e. consuming and producing data. An application puts a query 
written in OOQL and it causes a database engine to pick some data from a database 
and give it to the application. 

Data mining can also be viewed as an process. But this time it takes a database as 
an input and produces mined knowledge as an output. Using some data mining query 
language an application would put a query written in this language and the query 
would cause some data mining program to pick data from a database, analyse it and 
give mined knowledge to the application as a result of the query. 

The data mining query language would enable an application to get knowledge 
from OODB in certain standard way similar to receiving stored data from OODB 
using OOQL. This is the reason for extending OOQL with data mining ability. This 
extension is possible under satisfying several conditions which will be described in 
following chapters. 



2.1 Database 

An object-oriented database ([7, 8]) will be input for data mining process. There are 
no additional adaptations of the database required for data mining purposes, no 
completion or adding of attributes, methods or objects. There are no required changes 
of already existing data or metadata. There is only one exception and it is completion 
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of output definitions, i.e. definitions of mined knowledge and rules, which will be 
described in the following chapter. 

Data mining program takes the OODB as it is. It doesn’t worry about a database 
structure, type or origin of stored data, etc. It retrieves data from OODB using OOQL 
in the same way as any other database application. So from the data mining point of 
view no other change concerning the OODB is required. 

Solely it is necessary to give the data mining program access to metadata. Data 
mining program needs the description of analysed data, i.e. objects it is working with. 
It needs to know description and structure of classes to which analysed objects 
belong. This information about object attributes and their types, about methods and 
their parameters are important for proper choice of data mining techniques and 
algorithms, for proper treatment of analysed data. 

Knowing the analysed objects metadata the data mining program can choose 
optimal data mining strategy and mine interesting knowledge effectively. 



2.2 Mined Rules 

Data mining output is represented by newly obtained knowledge in the form of mined 
rules. Formally, the rule is expression E in language L describing the facts in certain 
subset of data from database ([!]). 

For example, the expression „If a customer buys bread, he will also buy milk with 
probability of p percent“ can be a rule for appropriate choice of threshold p. 

Rules are characterised by two parameters. The rule support generally expresses 
the rule strength. Thus the higher the rule support, the lower the probability of 
accidental deducing of the rule only from a few transactions. The rule confidence 
generally expresses the measure of correlation in the database among items of the left 
and right side of the rule. Higher confidence means again higher rule quality. 

There are various rule types ([!]) focusing on particular aspects of relations among 
data, data structure and content. It is not necessary to involve all these rule types into 
data mining extension of OOQL. We can constrain ourselves only to limited subset of 
these rules. In the case of ODAMIL or DMOQL languages mining of only limited 
number of rule types is supposed. Association, sequential and classification rules are 
to be mined. 

The concept of knowledge and rule specified in this formal way must be expressed 
and defined by means of OODB, i.e. described in the data definition language (DDL). 
It is necessary to know the result type already in the moment of putting a query. In the 
case of OOQL the output is a collection of objects of some already defined class. In 
the case of data mining the output is a set or collection of rules. These rules must be 
first defined and their definitions, in other words rule metadata, added to the OODB 
metadata. Mined rules are formally treated as objects of some class returned as 
a query result. That’s why it is necessary to know rule metadata in advance. 
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2.3 Query Language Extension 

The OOQL serves for data retrieval. But knowledge is nothing else then another sort 
of data. Mined knowledge is described by metadata stored in the OODB and hence 
knowledge can be treated in the same way as ‘normal’ data. From this point of view 
there is no difficulty with incorporating knowledge discovery into data retrieval, i.e. 
extending OOQL with data mining ability. 

There are some common features required for such extension. Adding rule 
metadata definitions into OODB is the first one mentioned before. The OOQL 
extending itself consists in enriching the OOQL with some new construct performing 
data mining. This new construct, let’s say MINE command, would be similar to the 
SELECT statement from the SQL language or its equivalent from particular OOQL 
retrieving data from a database. 

There are three possible issues with this addition. The first one is not a problem 
actually. Both OOQL constructs and MINE command retrieve data from OODB, i.e. 
behave in the same way. The only difference is that OOQL puts data immediately into 
output while MINE command analyses if first, extract rules from it and then put them 
into output. So from the point of view of OODB and treating it there’s no problem. 

The second one is the output problem. OOQL produces data while the MINE 
command produces knowledge. But knowledge is formally similar to data as 
mentioned before because both data and knowledge are defined by metadata and are 
formally treated as objects of some classes. Thus the difference between data and 
knowledge is wiped away. And the data mining process inside the MINE command 
does not bother. 

The third and last problem is the possibility of adding some new element into 
OOQL. But OOQLs have usually some constructs such as user defined functions 
which can be utilised for the purpose of data mining extension. Now adding the 
MINE command won’t be so demanding and factitious. And in addition the OOQL 
interpreter would have to be modified or rewritten anyway because, in fact, we define 
new OOQL requiring new interpreter so incorporating some new construct won’t be 
so burdensome. 

Of course, the MINE command must have some parameters determining required 
rule type, relevant data set and other necessary auxiliary components of data mining 
algorithms such as various thresholds for instance. But these parameters can be 
entered as parameters or primitives of the MINE command. According to number and 
type of mined rules it is possible to add only one MINE command or more separate 
commands for mining each rule type. 

So the adding of MINE command is possible and is not so demanding. There are 
two another difficult things. The first one is choosing or designing advisable data 
mining algorithm(s) for performing data mining and producing required knowledge. 
The second one is presentation and utilisation of mined knowledge. But these tasks 
are not covered in this paper. 

Once the OOQL is extended with the command(s) for data mining application can 
use this feature and obtain knowledge from OODB in the same simple way as 
retrieving data. 

The principles mentioned above are introduced in following two chapters in two 
examples of object-oriented data mining query languages: ODAMIL and DMOQL. 
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Instead of abstract MINE command there is used a function MineRules as 
a representative of the data mining command. 



3 The DMOQL Language 

The DMOQL language arises from the OQL language and that’s why we first 
introduce briefly this object-oriented query language developed by the ODMG group. 



3.1 The OQL Language 

The ODMG-93 is a reference standard built upon the existing SQL-92 ([9]), OMG 
and ANSI programming language standards. It includes also OQL (Object Query 
Language). OQL is an adaptation of the SQL-92 query language and is extended with 
all features of the ODMG object model. It includes the ability to include operation 
invocation in queries, to query over object inheritance hierarchies, to invoke inter- 
ohject relationships, and to query over arbitrary collections. 

OQL is an SQL-like declarative (nonprocedural) language that provides a rich 
environment for efficient querying of database objects, including high-level 
primitives for object sets and structures, while retaining compatibility with the SQL- 
92 SELECT syntax. 

OQL is a language where operators can he freely composed, as long as the 
operands respect the type system. This is a consequence of the fact that the result of 
any query has a type which belongs to the ODMG type model, and thus can be 
queried again. 

OQL provides a superset of the SQL-92 SELECT syntax. This means that most 
SQL SELECT statements which run on relational DBMS tables work with the same 
syntax and semantics on the ODMG collection classes. 



3.2 The DMOQL Language Syntax 

The DMOQL language is proposed as an extension of object-oriented query language 
OQL ([5, 6]). The DMOQL is drafted generally and can be complemented for mining 
of arbitrary rule type. In this paper there is designed mining of basic rule types, 
namely association, classification and sequential rules. It is also possible to specify 
necessary auxiliary components of data mining algorithms, such as various thresholds 
for instance. 

In the DMOQL language design there is employed one feature of the OQL 
language. The OQL language enables to invoke function inside a query. That’s why 
the DMOQL language defines new function MineRules. Similarly to the select 
statement which retrieves data from OODB the MineRules function performs data 
mining and extracts knowledge from OODB. Necessary additional information is 
passed through parameters of the MineRules function. 
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The MineRules function definition is given here using a rather informal BNF 
notation. { symbol } means a sequence of 0 or n symbol(s). [symbol] means an 
optional symbol. 

query ::= MineRules (RuleType , TargSet 

[, Thresholds] [, RelatedAttrs] ) 


RuleType 


::= 'association' 'sequential' 

' classification' 


TargSet 


: : = query 


Thresholds 


::= set (ThrDef {, ThrDef}) 


ThrDef 


::= ThrlD float literal 


ThrlD 


::= 'minsupp' | 'minconf' 


RelatedAttrs 


::= set ( identifier) 



The MineRules function has four parameters with following description : 

• Parameter RuleType determines the type of mined rules. Particular values 
determine type of mined rules according to their names. 

• Parameter TargSet defines the target data set. It is essentially a subquery in the 
OQL language. The subquery is evaluated and the resulting collection or set of 
objects represents the target data set. The target data set contains data which are 
interesting for data mining at the moment. It contains then relevant data which are 
analysed and from which rules are mined. 

• Optional parameter Thresholds contains particular thresholds definitions. If this 
parameter is empty implicit thresholds are used. Otherwise it contains set of 
definitions each of which sets one threshold. 

There are two threshold types, namely minimal support (determined by keyword 
minsupp) and minimal confidence (determined by keyword minconf ). It means 
that the rules will be put as knowledge into output of the MineRules function 
only if their support and confidence will be greater than given minimal thresholds. 
Threshold’s value itself is entered in percentages but written as a number within 
0 to 1 interval. 

• Parameter RelatedAttr contains list of attributes according to which the 
classification of target data set will be performed. It is entered only for mining of 
classification rules. In this case the target data set is processed by classification 
algorithm ([10, 11]) at first. The algorithm classifies the data set, i.e. divides it, into 
categories according to entered attributes. After this suitable data mining algorithm 
is executed which will mine classification rules for each data category. 
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4 The ODAMIL Language 

The ODAMIL language is proposed as an extension of object-oriented query 
language Object Comprehensions ([4]). The ODAMIL language enables mining of 
basic rule types, namely association, classification and sequential rules but it is 
possible to complement it for mining of arbitrary rule type. Next it is possible to 
specify necessary auxiliary components of data mining algorithms, such as various 
thresholds for instance, by means of this language. 

Queries in the Object Comprehensions language are entered similarly to specifying 
a set in mathematics. For example the set of squares of all odd numbers from the set S 
would be entered this way : 

{ x"" I X G S, Odd (x) } 

This standard mathematical notation was inspiration for the Object 
Comprehensions language. We introduce here one example of query written in this 
language : 

Set [ s <— Student, s . address . city = „Prague" | s] 

This query returns a collection of all students from the class Student who live in 
Prague as its result. 

The Object Comprehensions language contains a lot of other constructs for 
entering various types of queries. It enables also to use user defined functions which 
can be parametrised. These functions work with objects in the OODB and return 
collections of objects as their results. They behave therefore as subqueries. 

This feature of Object Comprehensions language is employed in the ODAMIL 
language design. The ODAMIL language newly defines function MineRules which 
performs data mining. Mined rule types, target data set and other information is 
entered through parameters of this function. 

MineRules (RuleType : TRuleType; TargSet : Set Of Object; 
Thresholds: Set Of TThreshold; 

RelatedAttr: Set Of String) : Set Of Rule; 

The MineRules function has four parameters with following description : 

• Parameter RuleType determines type of mined rules.. Type TRuleType is 
defined as enumerated this way : 

TRuleType = Enum (association, sequential, 
classification) ; 

Particular values determine type of mined rules according to their names. 

• Parameter TargSet defines target data set. It is essentially a query in the Object 
Comprehensions language. The query is evaluated and the resulting collection of 
objects represents the target data set. 

• Parameter Thresholds can contain particular thresholds definitions. If this 
parameter is empty implicit thresholds are used. Otherwise it contains set of 
records each of which sets one threshold. Parameter Thresholds type is defined 
as follows : 
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TThreshold = Record 

IDThreshold: Enum (minsupp, minconf ) ; 

Value: Real; 

end; 

Attribute IDThreshold determines threshold type. There are two threshold 
types, namely minimal support (determined by keyword minsupp) and minimal 
confidence (determined by keyword minconf). Attribute Value contains 
threshold’s value itself which is entered in percentages but written as a number 
within 0 to 1 interval. 

• Parameter RelatedAttr contains list of attributes according to which 
classification of the target set will be performed. It is meaningful then only for 
mining of classification rules. In other cases is empty and is not relevant. When 
classification rules are mined the target data set is processed by classification 
algorithm ([10, 1 1]) at first. The algorithm classifies the data set, i.e. divides it, into 
categories according to entered attributes. After this suitable data mining algorithm 
is executed which will mine classification rules for each data category. 



5 Examples of Mined Rules 

Both in the ODAMIL and DMOQL languages the MineRules function is used in 
queries in a normal way as other query functions. Usage of both languages will be 
demonstrated in the following examples. First query is written in the DMOQL 
language and the second one in the ODAMIL language. 

Example 1 

MineRules (' association' , select * from Purchases where 
total < 100) 

Set [ r <— MineRules (association. Set [ p <— Purchases, 
p. total < 100 I p] , Set [] , Set [] ) | r] 

This query returns a collection of association rules mined from data about potty 
purchases as its result. The target data set will be a collection of transactions, i.e. 
particular customer purchases, which total does not exceed $100. Implicit thresholds 
will be used during data mining. 
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Example 2 

MineRules (' sequential ' , select * from Purchases where 
(date > 1.1.1997) AND (date < 31.12.1997), 
set (minsupp 0.75, minconf 0.8)) 

Set [ r <— MineRules (sequential. Set [ p <— Purchases, 
p.date > 1.1.1997, p.date < 31.12.1997 | p] , Set 

[Thr (minsupp, 0.75), Thr(minconf, 0.8)], Set [] ) | r] 

This query will mine sequential patterns from data about transactions performed in 
1997. The target data set will be collection of transactions performed in 1997. 
Thresholds will be set so that minimal support of mined rules will be 75% and 
minimal confidence will be 80%. 

Example 3 

MineRules (' classification' , select * from Purchases 
where (date > 1.1.1997) AND (date < 31.12.1997), 

Set [ ' Customer . Spending ' ] ) 

Set [ r <— MineRules (classification. Set [ p <— Purchases, 
p.date > 1.1.1997, p.date < 31.12.1997 | p] , Set [] , 

Set [' Customer . Spending' ] ) | r] 

This query will mine classification rules from data about transactions performed in 
1997 using implicit thresholds. The target data set will be divided according to 
customer’s spending and then classification rules for each category will be mined, e.g. 
what goods is bought by rich and poor customers, information which could be useful 
for marketing, etc. 



6 Conclusions 

Data mining and knowledge discovery represent a very important area for further 
research and development. And not only in the area of relational databases but 
especially in the environment of new progressive database systems such as OODB or 
deductive object-oriented databases (DOOD). These database systems are capable to 
hold large amount of data but also involve its structure and mutual relationships. Just 
these features can be successfully employed in data mining. 

In a time of rapid development within this area there is increasing need for data 
mining query language which would enable simple and uniform entering of various 
data mining tasks and which would represent standard interface between application 
and data mining system. 

In this paper we described general conditions and instruments for extending OOQL 
with data mining abilities and a design of object-oriented data mining query 
languages ODAMIL and DMOQL. These languages are proposed for effective data 
mining in the OODB. 
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Future research will probably continue towards the development of new effective 
algorithms for data mining in the OODB and towards the employment of the DOOD 
as a progressive tool for handling data and knowledge. Integration of data mining into 
the environment of DOOD is very interesting and important area. 
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Abstract. Variables’ annotations in over-constrained problems enable 
to express preferences for optimal solution selection using preferences on 
variables. The basic interpretation of variables’ annotations is presented 
and correspondence with hierarchical CSP is described. New local com- 
parator for constraint hierarchy is proposed and used for solving con- 
straints with variables’ annotations. The relationships between standard 
locally-better comparator and the new one are clarified. The potential 
application areas of variables’ annotations are also mentioned. 



1 Introduction 

Over-constrained problems are usually solved by giving some preferences or 
weights to individual constraints and defining the solution as such a valua- 
tion which minimizes the violations of constraints. There are, however, over- 
constrained problems with partially or even completely ordered variables. As- 
signing preferences to variables could be more natural than defining preferences 
for constraints artificially. We describe a new constraint solving environment 
where preferences (or annotations) are assigned to individual variables instead 
of to the constraints themselves m- Moreover, the annotations are local to 
variable occurrences, i.e., any variable may have different annotations in diffe- 
rent constraints (in fact, even different occurrences in the same constraint are 
allowed) . 

Variables’ annotations could be suitable for application areas as for example 
planning or scheduling. Using of our annotations is advantageous in applications 
where variables have their own preferences. These preferences could be applied 
directly instead of creating unnatural preferences over individual constraints. 
The classical example of such application is the timetabling problem where va- 
riables represent teachers (dean, professors, assistants...), rooms (more and less 
occupied), and different groups of students. For example, the lecture taught by 
a professor should be more preferred than another lecture taught by an assistant. 

Let us consider the small real example illustrating meaning of variable prefe- 
rences. There is a lecture L and its practice P. The practice should be preferably 
taught at least one day after the lecture. We would like to express by the follo- 
wing constraint that the professor’s lecture is more preferred than the assistant’s 
practice. 
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LSstrong + 1 #=< POmedium 7, cl 

There are two weaker constraints: the lecture has to be taught on Thursday or 
Friday and the practice from Monday to Thursday: 

LSweak in 4 . . 5 7o c2 

PSweak in 1 . . 4 7o c3 

These constraints form a kind of hierarchy: the constraint cl with the highest 
preferences must be satisfied first and then we may try to satisfy constraints 
c2 and c3. It is possible to satisfy cl but not c2 and c3 taken together. The 
constraint c2 influences the variable with higher annotations (look at cl), so 
this constraint is also satisfied. Then, trying to minimize the overall constraint 
violation, we get (a kind of) optimal solution L=4, P=5. By classical hierarchy 
where cl is annotated by strong or medium, the solution L=3, P=4 is also ob- 
tained. But this solution is not optimal from our point of view. The different 
requirements towards the lecture and practice must be stated by assigning diffe- 
rent preferences to c2 and c3 and so these constraints must be ordered. But this 
could be wrong with respect to other constraints in a more complex problem. 
Also, the exact location of the two appropriate constraints need not be easy to 
find in this context. 

2 Constraints with Variables’ Annotations 

A constraint system with variables’ annotations is derived from standard con- 
straint satisfaction m As in other frameworks for solving over-constrained 
problems, an error function e{c9) is applied and indicates how nearly constraint c 
is satisfied for a valuation 9. This error function can be trivial {e{c9) — 0/1 me- 
ans c is satisfied/unsatisfied) or we can define the error function by using the 
domain’s metric. 

The constraint system is extended by variables’ annotations in constraints 
[7, 6] . Every variable in every constraint has determined special annotation from 
annotation set: a : C x E ^ A. There is a function @ for computing global 
annotation: 

— global variable annotation av : V ^ A, av{v) = © a(c,v) , 

{cGC\ v^var{c)} 

— constraint annotation ac : C —>■ A, ac(c) = © a(c,v) , 

{vGvar(c)} 

— global constraint annotation acv : C ^ A, acv(c) = © av(v) , 

{vGvar(c)} 

where var(c) is a set of variables of constraint c. 

3 Hierarchy with Global Comparators 

Correspondence between constraints with variables’ annotations and constraint 
hierarchies m with global comparators is described in this section. The hierar- 
chy is constructed over constraint annotations ac, with additional order imposed 
by global constraint annotations acv within each level. 
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The annotation set A is an interval (0, 1). The greater values of annotations 
are more preferred. The value 0 is not a member of A because a variable with 
such annotation plays no role in the constraint system. 

Function © is the geometric average over real numbers. Constraint hierarchy 
is constructed using ordering of constraints <c which is defined using constraint 
annotation: 



c, d & C : c <c d = ac(c) > ac(d) . (1) 

The hierarchy of constraints C = Cq U Ci U . . . U is a union of disjoint 
sets Ci where preferences of constraints decrease with increasing value of i: 

Co = {c G C| ac(c) = 1} . . . level with required constraints , 

Ci = {c G C\ (yd G CjA < i : d <c c) A (Ve G CkA < k : c <c e)} i > 0 . (2) 

The valuation 9 has an error E(C9) = [^(Ci^), . . . , C(C„0)], where E(CiO) = 
X){cGCi} o-cv(c) e(c9) holds. The value E(Cq9) is not considered because all con- 
straints at the level Cq have to be satisfied. The value acv(c) is understood 
as a weight of constraint c. The optimal solution 9 has minimal error E(C9) 
compared by weighted-sum-better comparator as in classical hierarchical CSP. 

In a similar way, worst-case-better and least-squares-better comparators can 
be applied: 



worst-case-better: E(Ci9) = max{cGCi} acv(c)e(c9) , 
least-squares-better: E(Ci9) = X){cGCi} oc?;(c) e(c0)^ . 

The timetabling example from the Introduction solved by the hierarchy with 
weighted-sum-metric-better comparator describes [S| in detail. 

4 Hierarchy with Local Comparators 

The standard local comparator Q uses no weights. Therefore the decomposition 
of C using only constraint annotation ac is not suitable — all global information 
of variables would be omitted. On the other hand, use of only global constraint 
annotation acv can cause that a constraint with smaller variable’s annotations 
is in the more important level than another constraint with higher variable’s 
annotations because the first constraint has higher value of acv. Therefore a 
combination of ac and acv have to be used for the construction of a hierar- 
chy. This is done by a redefinition of the ordering <c (global comparators use 
definition ®): 

c, d G C : c <c d = (ac(c) > ac(d)) V 

((ac(c) = ac(d)) A (acv(c) > acv(d))) . 

^ We recall the definition of locally-better comparator briefly (in detail my- 
locally-better(9 , 5,C) = 3k G 1 . . .n such that (VZ G 1 ... fc — 1 : (Vc G Ci : e(c9) = 
e(cS))) A (3c G Ck : e(c9) < e(cS)) A (Vd G Ck ■ e{d6) < e(d5)) 
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The hierarchy is constructed as above (|2) and the locally-better ^ comparator 
can be used for the solution selection. There is no reason for decomposing of Cq 
using acv because Cq is required level where all constraints have to be satisfied. 

Another possibility is introduced by the definition of a new local comparator, 
which uses an ordering of constraints at every level. Levels are defined using m 
and the ordering is then defined through global constraint annotation acv. 

Definition 1. A valuation 9 is ordered-better than another valuation 6 if, for 
each of the constraints through some level k — 1, the error after applying 9 is equal 
to that after applying 5, and at the level k the errors are compared with respect 
to an ordering <y, of a set W given by a function w \ C ^ W (proposition 
w(c) <w w(d) means c is preferred constraint over d): 

ordered-better{9,5,C) = 

G 1 . . . n such that 

V/ G 1 . . . fc - 1 Vc G Ci : e{c9) = e{cS) 

A 3c G Cfc : e{c9) < e{cS) 

A\/d G Ck such that w{d) <u, w{c) : e{d9) < e{d5) . 

The valuation 9 is ordered-better if no valuation oj ordered-better than 9 exists. 

All constraints at level Cq have to be satisfied and therefore we may restrict our- 
selves to levels 1 . . . n only. We can choose trivial error function e {e{c9) = 0/1 
means c is satisfied/unsatisfied) or metric function (using metric of variables’ do- 
main), and then we get ordered-predicate-better or ordered- metric-better com- 
parators, respectively. 

Now we can define the mapping of constraints with variables’ annotations to 
constraint hierarchy with ordered-better comparator exactly. The hierarchy is 
constructed using and the ordered-better comparator chooses a better 

solution. The function w, the set W, and the ordering <„, correspond to acv, 
(0, 1), and > over real numbers, respectively. 

4.1 Ordered-Better and Locally-Better Comparators 

In this section, the relations between ordered-better and locally-better compa- 
rators are clarified. We will also show that both described mappings with local 
comparators give the same solutions. 

In the following, we suppose that C = {ci, C 2 , . . . , Cm} is constraint hierarchy 
with levels Cq, C\, . . . , Cn, an ordering <„,, and function w. 

Lemma 1. Every ordered-better solution 9 of hierarchy C is locally-better. 

Proof. Let us assume that ordered-better valuation 9 is not locally-better. Then 
a valuation w exists which is locally-better than 9. Next let Ck be the first 
level, where valuations uj and 9 have different values of error function on some 
constraints. Because the valuation uj is locally-better than valuation 9: 



yd G Ck ■ e{duj) < e{d9) . 



(3) 
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The level Ck is the first level where any error functions differ, and so the next 
proposition follows from ordered-better(0, w, C) (see Definition QJ) 

3d G Cfc : e{dd) < e{dw) (4) 

which is contradictory with the proposition (0J. So, we obtain that no solution oj 
locally-better than 6 exists and the valuation 9 has to be the locally-better 
solution. □ 



There are locally-better solutions, which are not ordered-better. For example, 
let us consider hierarchy C = C\ = {c, d} where w{c) w{d) holds. Let there 
exist solutions ui and 9 such that e(cw) > e{c9), e{dui) < e{d9). Both solutions 
could be locally-better but only 9 could be ordered-better because it is ordered- 
better than Lu. 

The next part concentrates on exact specification of relation between locally- 
better and ordered-better comparator. 



n 

Definition 2. Let C = [J Ci he hierarehy and w : C ^ W weight function. 

2 = 0 



n rii 

Hierarchy refinement C/w is defined by [j Ci/w, Ci/w = (J Cij if the proposi- 

i=0 j=l 

tion Cq/w = Cq holds and Cij is given for Vi G 1 . . . n, Vj G 1 . . . by a formula 
(Vc G Ci^d G Cl : {c G Cij,d G Cu,l G 1 . . .ni,j < 1) ^ (w(c) <w w{d))). 



The weights have no meaning in required level and we can suppose same weights 
for every c G Cq. So Cq/w = Cq = Coi is justified. 

Hierarchy refinements is a hierarchy, where the level Cij is more important 
than Cki, iff (z < fc) V ((z = k) A {j < 1)) holds. The level Cq is required and all 
constraints have to be satisfied for every solution. So, we may restrict ourselves 
to levels 1 ... 7z in comparing of potential valuations. 



Lemma 2. For a given hierarchy C, weight function w, and valuations 9 and 5 
the proposition ordered-better{9,5,C) <-> locally-better {9, 6, C/w) holds. 

Proof. (— >): Let 9 and S be valuations of hierarchy C and let 9 be ordered-better 
valuation than 6. Let k be the first level, where the error function on valuations 9 
and 6 differs, and let c G Ck he a, constraint with minimal weight w{c) such that 
e{c9) yf e(c<5) holds. Next let c G Cki holds for some I G 1 . . . rzfe in hierarchy C/w. 
We show that 9 is locally-better than i5 in C/w. 

1. e{d9) = e{dS) holds for every d G Cij,i G 1 . . . (fc — 1), j G 1 . . .Ui because 
the same holds for every d G Ci,i G 1 ... {k — 1) (error function on 9 and <5 
differs in the level fc for the first time). 

2. e{d9) = e{d6) holds for every d G Ckj,j G 1 . . . (^ — 1). The explanation of this 
fact follows. Firstly d G Ck and w{d) <w w{c) is obtained from Definition 0 
and j < 1. The constraint c has minimal weight in Ck such that error function 
on 9 and 5 differs. This entails e{d9) = e{d5) for d G Ckj. 
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3. e{c9) < e{c6) holds because c is the first by level and weight with distinct 
values of error function on 9 and S, and 9 is ordered-better than S. 

4. e{d9) < e{dS) holds for every d G Cki because 9 is ordered-better than 6, 
d G Ck and w{d) <u, w{c). 

We have shown that proposition e{d9) = e{d5) holds for Vd G Cij, {i < k)\/ ((z = 
k) A (j < 1), next e{c9) < e(c6) and Vd G Cki '■ e{d9) < e{d6) hold. This means 
that 9 is locally-better than <5 in C/w. 

{■<—)'■ This proof is very similar to opposite direction. Let 9 be locally-better 
than d in C/w. Let error function differ for c G Cki firstly. So e{c9) < e(c6) 
stands. The statement \/i < kVd G Ci : e{d9) = e{d5) is implied from locally- 
better comparator definition (VzVj such that (z < fc) V ((z = k) A {j < 1)) 
Vd G Cij : e{d9) = e(d<5)). For the same reason, e(d9) = e(dS) holds for 
Vd G Ck such that w(d) w(c). The proposition e(d9) < e(dS) holds for 

yd G Ck ■ w{d) =w w{c) because d G Cm holds, error functions on 9 and 5 differ 
on Cm firstly and 9 is locallly-better than <5 in C/w. So, all necessary conditions 
are satisfied and 9 is ordered-better than d in C. □ 

Theorem 1. The valuation 9 is ordered-better solution of hierarchy C with 
weight function w, iff 9 is locally-better solution of hierarchy refinement C/w. 

Proof. Entailment of Lemma 0 . □ 

Now we can go back to our mappings with local comparators. The ordered- 
better comparator was applied to hierarchy C constructed using ac and to weight 
function acv. The hierarchy with locally-better comparator was constructed 
using ac and then acv. Such hierarchy is the hierarchy refinement C/acv. App- 
lying of Theorem Eis obtained that both mappings compute the same solutions. 

4.2 Algorithm for Solving the Hierarchy 

This part gives tools for solving system of constraints with variables’ annotations 
using constraint hierarchies with ordered-better comparator. The basic algorithm 
with its complexity analysis is described in the end. 

Definitions. A sequence SC = (ci,...,Cm) is hierarchy-ordering of hierar- 
chy C with m constraints if all constraints of SC are sorted by the level of 
hierarchy (ci G Ck, Cj G Ci, k < I implies i < j) and by the ordering <w (for 
Ci,Cj G Ck such that w{ci) <w w{cj) implies i < j). A sequence (ci, C 2 , . . . , Ci) 
is denoted SCi for i < m. 

Definition 4. Let SC = (ci, C 2 , . . . , Cm) be a hierarchy-ordering of hierarchy C . 
Recursively defined set S = Sm is denoted ordering-solution-set of hierarchy- 
ordering SC if 

S'o = {6* I 6* zs a valuation of S'C'} , 

Si = {9 \ 9 G Si-i A e{ci9) = mini^^Si-ie{ciio)} for i G 1 . . . m 



holds. 



Constraints with Variables’ Annotations and Constraint Hierarchies 



415 



Lemma 3. Let us consider constraint hierarchy C with weight function w. If 
w{d) w{f) holds for every two constraints d, f G for all k G 1 . . .n, then 
a value of error function e(c0) is determined for every constraint c G C and for 
every ordered-better solution 9 uniquely. 

Proof. Let C and w satisfy mentioned properties. There is only one hierarchy- 
ordering SC of such hierarchy C. We show that the set Si from Definition El 
is the set of all ordered-better solutions of SCi for Vi G 1 . . .m. So the value 
of e(ci9) is uniquely determined for every i. 

The proof is by induction on i. The base case i = 0 is trivial because SCq is 
empty and Sq is the set of all hierarchy’s valuation. 

Suppose that the proposition holds for SCi-\ and now consider the solutions 
of SCi = (ci,...Ci). Let R denote set of all ordered-better solutions of SCi. 
Constraint Ci belongs to a higher level of hierarchy than Cj (for Vj < i) or to 
the same level and then rc(cj) <w w{ci) holds. This entails R C 5^-1. Let 
to G R exist such that value e(ciOj) is not minimal. Then 9 G Si-i exists with 
minimal e{ci9), which entail 9 G Si and e{ciU>) > e{ci9). Next e{cjUj) = e{cj9) is 
implied for Vj < i from 9,uj G Si-i. So, valuation 9 is ordered-better than oj. The 
valuation uj can not be the member of set R, which consists from ordered-better 
solutions only. For all 5 C i? the value e{ci6) have to be minimal and so R= Si 
is obtained. □ 



Theorem 2. Let SC be a hierarchy -ordering of C and a set S be the ordering- 
solution-set of SC. Then S is the set of ordered-better solutions. 

Proof. The proof is by induction on the number of constraints m. The base case 
is for m = 1. The hierarchy is C = {ci} and only one SC = (ci) exists. We 
obtain S' = = {0 | Vw : e(ci0) < e(cio;)} and so every valuation 9 G S is 

an ordered-better solution. 

Suppose that the proposition holds for a hierarchy with m constraints and 
now describe the case with m-l- 1 constraint. Let us suppose 9 G Sm+i and show 
for every valuation S that either 9 is ordered-better than S for SCm-i-i or S is not 
ordered-better than 9 for SCm-i-i (9 and S are not comparable for SCm+i). 

1. S ^ Sm-i-i AS G Sm '. The error function for every Ci{i G 1 .. .m) is defined 
uniquely which follows from the assumption S G Sm and the definition of 
ordering-solution-set. Inequality e{cm-i-i9) < e(cm-t-iS) is implied from the 
assumptions S ^ Sm-i-i and minimal value for Cm+iS error function. Together 
both these properties induce that 9 is ordered-better than 5. 

2. 5 G Sm+l '■ The error function for every constraint is the same again, so 
no constraint Ci (i G 1 ... m -I- 1) exists such that e{c9) > e{c5) (or j) and 
neither 5 nor 9 is ordered-better than second valuation for SCm-\-i- 

3. S ^ Sm ■ 9 G Sm and so S can not be ordered-better than 9 for SCm from 

induction’s assumptions. We show that the adding of Cm-i-i does not change 
this situation for SCm-i-i- Tbe value of error function for some i G 1 . . . m 
differs for 9 and S (from S ^ Sm)- Let i be the first of them and suppose 
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Ci G Ck and e{ci9) < e{ciS) (by analogy for i). 9 and <5 are not comparable 
for SCra- So, some Cj G Ck such that w{cj) <u, w{ci) and e{cj9) > e(cjS) 
hold, has to exist. The proposition w(cj) w(ci) holds because i is the 
smallest index (j > i) and SCm is hierarchy-ordering. These differences 
induce incomparability for SCm+i too. 

Therefore every 6* G S' is an ordered-better solution. □ 

There are ordered-better solutions, which can not be obtained using any 
hierarchy-ordering as its ordering-solution-set. Let us consider the example C = 
Cl = {cl,c2} 

B >= 10 7. cl 
B =< 8 7. c2 

where w(cl) = w(c2) holds. The valuation {B = 10} is obtained for hierarchy- 
ordering (cl,c2) and {B = 8} for (c2,cl). Both valuations are ordered-better 
but for example a valuation {B = 9} is ordered-better, too. 

The algorithm for solving constraints with variables’ annotation is based on 
the Theorem |2| and Indigo algorithm |2I1| for local propagation by means of 
interval arithmetic 0 . Indigo algorithm manipulates the acyclic set 0 of ine- 
quality constraints with the complexity OdCj x |R|). The key idea in Indigo is 
that lower and upper bounds on variables (i.e. intervals) are propagated, and 
the constraints are processed from strongest to weakest, tightening the bounds 
on variables using interval arithmetic step by step. 

Our solution is divided into three parts: 

1. the splitting set of constraints C with variables’ annotations to constraint 
hierarchy {Co, Ci, . . . , C„| using constraint annotation ac and ordering <c, 

2. sorting constraints in every level Ci of hierarchy using global constraint 
annotation acv to an output sequence of constraints OCi, 

3. the application of the Indigo algorithm with sorted input constraints by the 
sequence (OCq, OCi, . . . , OC„). 



Theorem 3. Given an acyclic set of constraints, the algorithm computes ordc- 
rcd-metric-hetter solution. 

Proof. Input constraints for the Indigo algorithm define hierarchy-ordering SC 
using OCo, OCi, . . . , OCn. The Indigo algorithm minimizes error function in the 
order given by hierarchy-ordering SC. Those are requirements of Theorem 0and 
so we obtain an ordered-metric-better solution as a result of the algorithm. □ 

Let us denote m = \C\,k = \V\ and consider the complexity of algorithm. In 
the first step, constraint annotation is computed for every constraint. Because 

^ Bipartite constraint graph is acyclic. Vertices of this graph are variables and con- 
straints. An edge is created between variable and constraint when variable occnrs in 
this constraint. 
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every constraint contains maximally k variables, the complexity of this part is 
0{mk). The complexity of sorting m constraints is 0(m log m). The second step 
computes global variable annotations and because every variable is contained in 
maximally m constraints, the complexity 0{mk) is obtained. Complexity of com- 
puting global constraint annotation is also 0{mk) and sorting particular disjoint 
sets of altogether m constraint takes 0(m log m) steps. The complexity of the 
last step is 0{mk) 0. As a result, we get the total complexity 0{m{k -h logm)). 

Described algorithm for solving inequality constraints with variables’ annot- 
ation mapped to the hierarchy with the ordered-better comparator was imple- 
mented in Prolog with attributed variables and mutable terms. 

5 Conclusions and Future Work 

A new approach for solving over-constrained problems using variables’ annota- 
tions was described. This approach could be suitable for application areas like 
planning or scheduling. We defined the complete mapping from variables’ annot- 
ations to constraint hierarchies. We proposed a new local comparator for solving 
of problems with variables’ annotations. We described relation between standard 
locally-better comparator and new ordered-better comparator. We have shown 
that the weights used for the ordered-better solution selection enable more exact 
specification of preferences than locally-better comparator without redefinition 
of hierarchy. 

The future work will consists of the precise interpretation of constraints with 
variables’ annotations which manipulates these constraints more efficiently. We 
will consider the properties and scope of such interpretation with respect to 
real problems. We would like to concentrate on an incremental manipulation 
with our constraints. The incremental manipulation means that adding of a new 
constraint does not require a complete recomputation of a previously computed 
solution. The another interesting point is the so called ,, computation with an- 
notations”: let us imagine constraint system with annotations and simplifying 
of this constraint system together with suitable transformation of annotations 
so that solutions of both are the same. An attention will be also devoted to 
study of suitable algorithms for solving systems of constraints with emphasis to 
variables’ annotations and ordered-better comparator. 
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Abstract. The exact minimization of the size of Ordered Binary Deci- 
sion Diagrams (OBDD) is known to be an NP-complete problem. The 
available heuristical solutions of the problem still do not satisfy requi- 
rements of the practical applications. Development of the efficient algo- 
rithms that find acceptable variable orders within a short time and with 
a modest memory overhead is hence higly desired. 

In this paper we contribute to the solution of the minimization problem 
by a new variable reordering heuristic that is based on sampling. A small 
OBDD sample is chosen from the OBDDs that are considered for mini- 
mization. Solving the problem for this small sample, we obtain a variable 
order that is extrapolated and applied to the entire OBDDs. We present 
the first experimental results with the Sample Reordering targeted at 
combinatorial verification. The suggested heuristic is substantially faster 
than Sifting. 



1 Introduction 

Ordered Binary Decision Diagram (OBDD) as a scheme for representation of 
Boolean functions is applicable to all problems over a finite domain. Because 
of its excellent algorithmical properties, OBDD is the favorit data structure in 
computer-aided design, verification and testing of digital systems. The powerfull 
computing machinery spent a huge amount of financial resources to support the 
research aided to development and application of mathematical methods in their 
design groups. One of the ’’hot” topics is the BDD-based technology. 

Despite deep theoretical investigation and wide practical exploitation of the 
OBDD model, there are still many theoretical as well as practical problems that 
remain unsolved. The problem of highest priority for all practical applications 
that use OBDDs is their conciseness. Since the size of the OBDD representa- 
tion for a function may vary exponentially for different orders of variables, the 

® This work was partially supported by the German research society (DFG) via the 
project Me 1077/12-1, while the first author worked with Institute of Telematics. 
A preliminary version was presented on International Workshop on Logic Synthesis, 
Lake Tahoe, California. 
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problem of finding an optimal variable order that realizes the minimal size is 
of special interest. This problem is known to be NP-complete imi. The exact 
algorithm |7l8l9lt)| works well for small numbers of variables only, and cannot be 
used for a general purpose. More relevant to the practice are variable reordering 
heuristics (e.g., 



The variable reordering problem of OBDDs is a typical combinatorial pro- 
blem with a huge search space of feasible solutions. In addition, there is no 
known efficient method for exact evaluation of a variable order with respect to 
the size of corresponding OBDDs. More precisely, having an OBDD for a fun- 
ction / and a variable order tt, we have no efficient procedure to compute the 
size estimation of 7 tOBDD(/). The only way is to construct the OBDD, which 
can be performed efficiently merely in the case when the size of the resulting 
OBDD is polynomially related to the size of the initial OBDD (e.g.,[in|)- This 
makes the problem even more complicated, since heuristics that choose several 
candidates for a good order cannot avoid the construction of the OBDDs for 
their evaluation. This is the reason why methods like simulated annealing Q or 
genetic reordering algorithm take too much time. 



When a problem appears unsolvable in its full dimension, a natural approach 
is to reduce it to problems of lower dimensions. This idea can be found in several 
reordering heuristics: e.g.. Sifting H3! looks for a good position for one variable, 
thus obtaining a feasible solution that improves the size of the OBDD. This step 
is then repeated for all variables. Another example is Block-restricted Sifting 
m where OBDDs are partitioned horizontally into blocks that are minimized 
independently. 



In this paper, we follow the idea mentioned above combining it with the 
idea of sampling: A part of the considered OBDDs is taken as a representative 
sample and the variable order problem is solved for it. This subproblem has a 
substantially lower dimension and hence, it is possible to find quickly a feasible 
solution for the initial problem. This solution is used as an approximation of a 
good variable order for the entire OBDDs. The goal is to obtain not the best, but 
an acceptable order in a short time. The suggested Sampling Reordering method 
is presented as it was implemented in the advanced Decision Diagram package of 
Colorado University at Boulder ESI We report the experimental results on an 
example of symbolic simulation of the circuits that is the core of combinatorial 
verification. Our experimental evaluation showed the time advantage of the sug- 
gested reordering method. There was no clear advantage of any of both methods 
regarding the final size of the OBDDs. Encouraged by these results we started to 
work on some other applications, such that as sequential verification. The main 
idea is to speed up BDD-based operations by focusing on a subset of considered 
OBDDs. We believe that Sampling is a good basis for such application driven 
reordering. 



The paper is structured by the following way: The next section provides a 
reader with all necessary notions and facts regarding OBDDs. Implementation 
details of the sample reordering are discussed in Section 0 Section 0 contains 
experiments with the method used dynamically during symbolic simulation of 




Sample Method for Minimization of OBDDs 



421 



benchmark circuits. Evaluation of the method is performed by comparison with 
the most stable and widely used Sifting Algorithm as proposed by Rudell d 

2 Preliminaries 

2.1 Definitions 

In order to make the paper selfcontained, we give definitions of the notions used 
in this paper. We start by the definition of an OBDD and its interpretation as 
representation scheme for Boolean functions. 

An Ordered Binary Decision Diagram ( OBDD) P over a set of Boolean varia- 
bles Xn is a multi-rooted directed accyclic graph with the following properties: 

1. Sink-nodes are labelled by Boolean constants 0 and 1. 

2. Each internal node is labelled by a variable from A„ and has two distinguis- 
hable successors called low and high son, respectively. 

3. On any path, any variable occurs at most once. The order of occurrence 
of variables defines an order tt over A„, i.e., if Xi precedes xj on a path, 

^7T ■ 

A ttOBDD is an OBDD with the variable order tt. The size of an OBDD P 
is measured by the number of its non-sink nodes and is denoted by |P|. 

For any node u of P, an assignment a : A„ i— > {0,1}" naturally defines a 
computational path with the initial point in u and terminal point in a sink: if the 
path contains a node v labelled by Xi, and a{xi) = 0 (a{xi) = 1), then the path 
contains the low (respectively, high) son of v. u represents a Boolean function 
f{x\, . . . ,Xn), / : jo, 1}" I— > {0, 1}, if for each assignment a, the corresponding 
path terminates in a sink labelled by /(a(”^(a;i), . . . , P represents 

multiple Boolean functions represented by its roots. 

An OBDD is called reduced, if no two nodes represent the same function. 
The OBDD nodes that are labelled by the same variable form a node level. 

2.2 OBDD Properties 

Due to Shanon decomposition theorem, any Boolean function over A„ has a 
ttOBDD representation, for any variable order tt over X„. 

Fact 1 (^M]) 

1. Reduced ttOBDD for a function f is unique and minimal (w.r.t. its size) 
ttOBDD representation of f. 

2. Any OBDD can be reduced in linear time. 

The suitability of an OBDD as a data structure for Boolean manipulation is 
implied by their excellent algorithmical properties: 

Fact 2 (0) There are polynomial time algorithms for the following operations 
over functions represented by OBDDs: 
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1. Boolean binary operations 

2. building cofactors (i.e., restriction of a function by fixing a value of a varia- 
ble) 

3. evaluation of the represented function with respect to a given assignment 

4- satisfiability and tautology test, and computing the number of satisfiable as- 
signments 

5. existential and universal quantification over a constant number of variables 

An equivalent modification of an OBDD by exchange of two neighbouring va- 
riables in th order is a local operation called swap. 

Fact 3 ( |14|i Swap operation between the i-th and (i -I- l)-st node levels can be 
done in time and space 0{\Li\ -\- |Li+i|). 

Swap operation is the basic step in the most popular variable reordering 
algorithm Sifting m Sifting consider the Boolean variables in the OBDD to be 
reordered in the descending order with respect to the size of the corresponding 
node levels. A processed variable is moved through the whole order by means of 
swaps and the size of the OBDD is monitored. Afterwards, the variable is placed 
on the position where the minimal OBDD size was reached. It is not known how 
precise is the solution found by Sifting. However, because of its universality and 
easy implementation, it is the mostly used variable reordering heuristic. 
Restructering of an OBDD with respect to a new variable order can be performed 
efficiently, too. 

Fact 4 f [18 lU2il7] I Given an OBDD P over A„ and a variable order tt over 
Xn, construction of an OBDD Q that is functionally equivalent to P and which 
variable occurence satisfies ir can be constructed in time and space polynomial in 

\P\ + \Q\- 

3 Sampling Method in Variable Reordering 

In this section, we describe Sample Reordering - the suggested application of 
sampling technique to variable reordering of OBDDs. The main idea is to find a 
good variable order for a small sample of given OBDDs and to adapt it to the 
entire multirooted OBDD. There are three basic questions to be discussed: 

How to find the sample? 

How to minimize it? 

How to adapt the variable order of a minimized sample to the entire OBDDs? 

Each of these points may have an essential influence on the final solution. From 
our experience, simple methods are often more effective in practical applications 
than more sophisticated and complex ones. Hence, we will start with the simplest 
variant and then discuss possible improvements. 

A more appropriate phrasing of the first question would be: “How to find a 
good sample?” , where good means that any variable order that is optimal for the 
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sample is efficiently transformable into an order that is optimal for the entire 
OBDDs. Obviously, because of the NP-hardness of the optimal variable order 
problem, there is no hope to find an efficient algorithm for finding a small (e.g., 
less than half the size of the considered OBDDs) sample with the desired property 
mentioned above. Moreover, the same holds if we look for a small sample whose 
reduction yields an order that is efficiently transformable at least to a better 
order for the entire OBDDs. Therefore, a real goal is to look for the answer to 
the following question: How to find a sample that (at least often) assures an 
order that is better than the initial order of the OBDDs? In the very first step, 
we choose a sample in a random manner, as it is usual in sampling strategies. 

On one hand, this makes the method independent from an application, which 
can be seen as a positive property. On the other hand, this approach clearly does 
not exploit the whole potential of the method that could make use of application 
specific information. In Section ^ we describe the choice of the sample suita- 
ble for an application during symbolic simulation of a circuit. The best results 
were obtained for the sample of 15%-30%. This value may vary with different 
implementations, and of course, depends on a set of examples. 

Another important question is the choice of the size of the sample. A small 
sample can be reordered fast, but it gives less information about a good varia- 
ble order for the entire OBDDs than a bigger one. The overhead for copying 
and reordering of the sample must be in balance with the quality of the order 
found, i.e., the smaller the sample, the easier the handling, but there will also 
be less information about the OBDDs. An appropriate value of the sample size 
parameter can be derived experimentally. 

Besides the size of the sample, it is also important what portion of the va- 
riables from the support of the entire OBDDs it contains. We do not explicitly 
require a fixed portion of variables, but the choice of the sample is aimed to cover 
most of the variables. A potential extension could be an assignment of weights 
to variables, e.g., the size of the corresponding node levels, that will have an 
influence on the choice of the sample. 

The second question concerning the minimization of the sample partially 
depends on the OBDD package used and the optimization technique for the mi- 
nimization of the sample. Considering the implementation details of the CUDD 
package that we use for our experiments, we chose the following approach: The 
sample is copied and reordered by Sifting. 

The last, but nonetheless important question to be discussed is how to adapt 
the variable order achieved by minimization of the sample to entire OBDDs. 
This question actually consists of two parts: how to derive the new variable 
order for the entire OBDDs from the obtained variable order in the sample, and 
how to rebuild the OBDDs with respect to this new order. The trivial solution 
for the new order is to fix the positions of the variables that do not appear in 
the sample and reorder the rest of variables according to their positions in the 
reordered sample. In the very first experiments, we have tried to sift the variables 
that do not appear in the sample, but only in a restricted manner. Each such 
variable was sifted between two closest positions occupied by the variables that 
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appeared in the sample. This additional reordering did not yield an essential 
reduction of the size, and we omitted it from the next experiments. 

There may be also some restrictions about the positions of variables that are 
implied by the meaning of the functions represented by the OBDDs. An example 
for such restrictions in the case of sequential verification is that the present and 
next state variables of a finite state machine should stay together. These must 
be met in the new order, too. 

We can also try to estimate the quality of a new order and to avoid rebuilding 
to poor orders. Our conservative estimation is based on the following assumption: 
If the reordering of a sample did not bring substantial reduction, we do not expect 
that the order obtained substantially reduces the entire OBDDs. 

For reordering to a new order, we use a shuffling procedure that sifts the 
variables upwards to their new positions, starting with the variable that is po- 
sitioned on the topmost level in the new order, and proceeding subsequently 
to lower positions. We have observed that if the size starts to grow, the target 
order is usually not good. Hence, the rebuilding process is stopped whenever the 
OBDD size increases beyond a given factor. The rebuilding approach used has 
an advantage in that we get information about the OBDD size for some other 
orders that appear on the way to the target one. If one of these intermediate 
orders happens to be better than the found one, we sift the corresponding va- 
riables back to their best found position. If the first attempt fails, we decide 
whether we try to reorder again, based on another sample. 



4 Experiments 

The method described in the previous section has been implemented in the Colo- 
rado University Decision Diagrams package (CUDD-2.1.2) and used as dynamic 
reordering method for symbolic simulation of the LGSynthQl benchmark circuits 
and some circuits contained in the CUDD package. The parameters in CUDD 
were left at their default values in all experiments. The results are compared to 
Sifting Algorithm with respect to final size and time. All experiments ran on 
Pentium Pro 200’s with 64MB memory. In this section, we describe some of the 
experiments and, based on their evaluation, we propose an appropriate para- 
meter setting. The section is closed by the particular application of the sample 
reordering method to combinatorial verification. 

A chosen sample is copied and then reordered by means of Sifting as imple- 
mented in CUDD. The variables in the copy are created in the order of their 
appearance in the copy process. Since we keep the correspondence of the va- 
riables in the sample and in the entire OBDDs, we can easily determine a new 
variable order for the OBDDs from the order in the sample using the strategy 
described in the previous section. Afterwards, the copy can be discarded. 

In order to see whether the idea of sampling works for reordering of OBDDs, 
we ran several experiments. The measures of interest are the time and OBDD 
size. The method is compared to Sifting (as implemented in CUDD-2.1.2) with 
respect to these values. Time is considered as being more important, as long as 
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the size remains acceptable. After initial experiments with one-time reordering, 
we continued the experiments with dynamic reordering during combinatorial 
simulation of the circuits. To give an impression of the usability of the method, we 
present the results for dynamic reordering applied on a sample of larger circuits. 
Since the number of reorderings done depends on the quality of the variable 
order found by the heuristic, this type of experiments gives more information 
than a single application of the method. 

The first series of experiments was aimed to help in the choice of the sample 
parameters: sample size a, the growth factor that determines the size bound for 
shuffling to the new order /3, and the number of reordering attempts allowed for 
one dynamic invocation of the reordering 7. We have observed that 10% to 20% 
size growth during the shuffling to a new variable order, and 2 to 3 attempts per 
invocation of the reordering are satisfying parameter values. Then we held these 
two parameters constant and ran the experiments with varying sample size. The 
method was always used for dynamic reordering and for a final reordering of the 
OBDDs created by the symbolic simulation of circuits. The variable occurence 
in circuit description implied the initial order in all experiments. The results 
clearly showed that while a sample of 10% did not yield enough information, 
the overhead for a 60% sample was too large. There was a clear time saving for 
sample size of 15% to 30%. Besides that, smaller final size of OBDDs reordered 
by Sample Reordering was frequent, too. 



Sample Reordering Strategy for Combinatorial Verification 

In a particular application of the Sampling, we can use additional information 
for the choice of a sample. The idea is to prefer some roots as more important 
for minimization then others. These, for any reasons distinguished, roots will be 
chosen into the sample with a higher preference. We propose a simple reordering 
strategy with an appropriate parameter setting, targeted to symbolic simulation 
of circuits: 

A small sample is chosen from the newly created roots and copied to be 
reordered by Sifting (note that any reordering method can be used at this place) . 
The order obtained is applied to the entire OBDDs as described above. If this first 
attempt fails to reach an acceptable improvement, the same process is repeated 
for a new sample. 

Let us go into details. The new roots obtained as results from the Boolean 
operations applied during the symbolic simulation, i.e., OBDDs of some internal 
gates, are pushed onto a stack. Any garbage collection of the unreferenced nodes 
is completed by cleaning the stack. The size of the stack is bounded. Its capacity 
can be set according to the considered application and examples (in the presented 
experiments, we worked with a stack size of 500). The push operation into a 
full stack discards the bottom item. When the sample reordering is invoked, 
the sample is preferably built from the roots in the stack. There are several 
reasons for this: With proceeding computation, the newly created roots represent 
more and more difficult functions. Hence, their minimization is of high priority. 
Secondly, they are assumed to survive longer than those created sooner. And 
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finally, the last reordering had been invoked before these roots were existing. 
Hence, the current order is with high probability not suitable for them. 

If the OBDDs whose roots are in the stack do not suffice to cover the requi- 
rements on the size of the sample, we choose additional roots randomly. Then 
we proceed as described above: The sample is reordered using Sifting and the 
resulting order is used for reordering of the entire OBDDs. If the gain of this 
reordering amounts to at least 30% of the initial size, we stop the reordering and 
continue in the application. Otherwise, i.e., if the reduction does not reach the 
expected value, we try to choose another sample. 

The suggested Sampling strategy is evaluated in three series of experiments. 
The results are summarized in Table [D Parameter f3 is set to 1.2, i.e., the allowed 
size growth during the shuffling of variables to a new order is 20% like the 
default size growth during the Sifting. The size of the sample is set to 25% in all 
experiments. The number of attempts is at most 2. In order to avoid using the 
same sample repeatedly, which may happen if we work with a constant sample 
size parameter, half of the sample is chosen at random in the second attempt. The 
first experiment (the column labelled by 2 x 25%) ran as described above. The 
second experiment (the column labelled by 25% -I- 25%) differs from the first in 
that, in the second attempt, we do a conservative pre-estimation of the reduction 
reachable by reordering to the new order obtained by the reordering of the 
sample. If the reduction of the sample does not reach the expected value of 30%, 
then the entire OBDDs are not reordered to the new order. This decreases the 
number of shuffle attempts and leads to a further decrease of time. In all but the 
third experiment, the same method is used for dynamic and final reordering. If 
the resulting OBDDs are processed further in the next computational steps, e.g., 
in the case of the sequential circuits, if the application continues by reachability 
analysis, then it makes sense to spend more time by reordering at the end of 
symbolic simulation. The third experiment differs from the second in that Sifting 
is used as the final reordering. 

The number of reorderings during the symbolic simulation of a circuit varied 
from 4 to 22 (10 in average) in our experiments. The values of time and size for 
the Sample Reordering are the average values from 10 runs. 

In comparison with Sifting, we have a remarkable saving of time without 
incurring a penalty with respect to the total value of the final OBDD size. 



5 Conclusion 



We propose a Sampling Reordering as an efficient heuristic for minimization 
of OBDDs. The first experimental results with the random sampling proved a 
remarkable potential of the method. Our current work is focused on the use of 
Sample Reordering in particular applications where an additional information 
about the meaning of the represented functions is exploited for the choice of a 
sample. 



Sample Method for Minimization of OBDDs 



427 



Table 1. Experiments for sample size of 25% 









Sampling 






Circuit 


2x25% 


25%+25% 


25%+25% 
final Sifting 


Sifting 


bwllxll 


time 

size 


721.04 

150,842 


308.66 

182,269 


388.81 

136,543 


1033.86 

285,137 


bw8x8 


time 

size 


5.01 

9,641 


4.22 

9,719 


4.64 

8,190 


6.11 

9,050 


C499 


time 

size 


12.87 

32,911 


17.62 

44,238 


25.51 

41,900 


20.14 

26,624 


C880 


time 

size 


7.01 

13,495 


4.99 

18,665 


6.90 

10,920 


11.24 

10,440 


C1355 


time 

size 


23.76 

27,063 


21.32 

29,681 


24.17 

29,192 


76.01 

29,562 


C3540 


time 

size 


91.82 

34,286 


41.87 

34,060 


49.57 

31,858 


46.74 

23,950 


C7552 


time 

size 


91.31 

69,452 


47.74 

28,440 


52.92 

15,683 


30.99 

8,241 


ilO 


time 

size 


26.43 

33,351 


21.62 

34,154 


35.39 

32,605 


174.83 

67,971 


mm30a 


time 

size 


18.02 

21,548 


16.42 

18,433 


18.79 

17,659 


137.34 

100,591 


S13207.1 


time 

size 


19.94 

5,003 


15.21 

6,514 


16.52 

3,158 


42.02 

3,008 


S15850.1 


time 

size 


67.36 

27,409 


68.94 

32,241 


77.19 

19,812 


75.66 

12,539 


S35932 


time 

size 


37.97 

5,866 


34.27 

5,842 


42.12 

4,987 


50.84 

5,010 


S38584.1 


time 

size 


59.38 

28,344 


54.81 

30,860 


63.37 

16,680 


121.54 

15,121 


s4863 


time 

size 


93.54 

80,691 


83.53 

82,612 


131.97 

69,476 


254.03 

64,245 


s6669 


time 

size 


52.31 

25,626 


48.74 

27,299 


48.46 

22,351 


111.29 

22,109 


Total 


time 

size 


1,328 

565,528 


790 

585,027 


986 

461,014 


2,193 

683,598 
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Abstract. In our paper we discuss an approach to semiautomatic corpus 
processing aimed at analysing verb valencies in Czech and consecutive 
determining the type of TIL (Transparent Intensional Logic) construc- 
tion that belongs to the verb. Obtaining the type of the construction 
is a corner-stone of the logical semantic analysis of sentences. TIL is a 
highly suitable tool for representing the semantic structure of utterance 
as it is presented later in the paper. Our approach is based on the techni- 
que of partial syntactic analysis using a special kind of LALR grammar 
processing tool. 



1 Introduction 

Several approaches to semantic analysis have appeared during last decades. Many 
authors in computationally oriented semantics work with the assumption that 
knowledge of the meaning of a sentence can be equated with knowledge of its 
truth conditions: that is, knowledge of what the world would be like if the sen- 
tence were true Q. Traditionally the first order predicate logic was used for 
the semantic description of language. As Montague |2 showed, this logic system 
is able to capture an important range of the constructs but the range of valid 
constructs in natural language is far wider. Montague and his followers try to 
overcome this weakness. However, as Tichy showed in his book j^, the Mon- 
tague Semantics can run into severe problems when analysing certain kind of 
sentences, which are commonly used in natural language. That is why TIL was 
designed to represent semantic structure of the language by constructions. 

TIL, or Transparent Intensional Logic, similarly as Montague Semantics, fol- 
lows Frege’s principle of compositionality, i.e. “The meaning of a sentence is a 
function of the meanings of its constituents” P). The basic idea of TIL lies in the 
presupposition that every well-defined language has a definite intensional base 
which can be explicated by an “epistemic” framework. Tichy uses an unspecified 
epistemic framework with objectual base E which is a set of four types that 
form the basis of type hierarchy. Every entity that can be discussed in a natural 
language has its equivalent of the appropriate type over the base E. The TIL 
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object that represents the entity described by the analyzed expression is refe- 
renced not by some sort of name but rather as a construction of the object. The 
construction records relations among elementary parts of the discourse (words 
or word groups with a special meaning as a whole). That is why constructions 
can be advantageously used for expressing the semantics of natural language. 

The aim of TIL semantic analysis is to find an algorithm for associating 
language expression with equivalent construction. There is a three-leg way from 
the language expression to the (real world) object it identifies. The first step 
from the expression to the construction is a subject of semantic analysis. The 
connection between a construction and the constructed TIL object (the second 
part) is always fact-independent and it is directed by the mechanism of typed 
lambda calculus and thus it is well defined. The last leg of the journey is (mostly) 
dependent on the knowledge of the facts that hold in (and form) the actual world 
at the actual time. 

In computational linguistics researchers try to device analytical tools that can 
process large amounts of corpus data without the need of human supervision. 
Automatic analysis based on TIL needs to find a translation algorithm that takes 
as its input a natural language sentence and outputs the corresponding TIL 
construction. The corner-stone of sentence meaning analysis is the semantics of 
the verb group with its arguments. Analysis of the verb groups are often based 
on Fillmore’s semantic cases |H1, verb frames and verb valencies. 

Fillmore’s semantic cases and verb frames are not suitable enough for Czech 
language which displays quite complicated case system (7 cases in both num- 
bers). In Czech grammatical tradition, which prefers rather dependency oriented 
approach to syntax, valencies are widely used. If we decided to use Fillmore’s 
semantic cases, we would have to somehow solve the conflicts between “deep” 
semantic cases and “real” grammatical cases existing in Czech. Our valency not- 
ation makes it possible to work with all 7 cases (nominative, genitive, dative, 
accusative, vocative, locative and instrumental) directly (to show an example). 
If there is a further need for semantic specification of the cases, it can be done 
by means of the appropriate semantic features and selectional restrictions. 



2 Verb Valencies 



In the following text we use the concepts of valency expression and valency 
pattern or valency. Valency expression is a schematic notation of a noun or 
adverb group or a clause, that expresses the requested obligatory attributes of 
the group or clause. Valency pattern for a given verb is formed by a set of valency 
expressions that express a scheme of a semantically correct part of sentence 
which contains the verb and appropriate noun or adverb groups or clauses. For 
example, the verb vyvozovat (infer) has two different valency patterns: 



vyvozovat nico z niBeho 
vyvozovat z niBeho , Be 



infer something from something 
infer from something that 
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The format used for valency representation must be designed so that complies 
with the following requirements: 

1. it describes all the syntactic information of the relationship between verbs 
and its arguments 

2. it is easy to parse with computer tools 

3. at the same time it must be effectively decodable by a human 

The format we present meets the above points. The format describes the 
valency expression schema using the attribute- value pairs. The basic attributes 
and their values are enlisted in tabled 



Table 1. The basic attributes of used valency notation 



attribute h 
type 

(semantic 

features) 


attribute c 
case 

(grammatical 

features) 


attribute s 
clause 
(syntactic 
features) 


attribute r 
preposition 
(syntactic 
features) 


P, person 
T, thing 

Q, quality 

R, reflexive 
M, amount 
L, location 

A, direction from 
F, direction to 
D, gen. direction 
W, time 


1, nominative 

2, genitive 

3, dative 

4, accusative 

5, vocative 

6, locative 

7, instrumental 


I, infinitive 

C, conj. az 

D, conj. ze 
F, conj. zda 
P, conj. at’ 
R, rel. clause 
U, conj. aby 
Z, conj. jak 


particular 
preposition 
in curly 
braces 



The transcription of valency patterns for the above mentioned verb vyvozo- 
vat then looks like this: 

vyvozovat <v>hTc4-hTc2r{z} ,hTc2r{z}-sD 

One can make an objection to the readability of the format. Actually linguists 
working with valencies may use the “verbose” format which corresponds to the 
linguistic tradition of valency notation in Czech. Of course, both the formats 
are equivalent to the feature structure representations usually assumed in recent 
grammatical theories. 

3 Building a Valency List 

Linguistics has been using the concept of verb valency for a long time, but, 
without the advantage of computer tools, the work with valencies is a very 
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lengthy and inevitably incomplete process, the results of which are of informative 
value only. At present new ways of getting and exploiting a valency list of a 
language seem to appear. 

1. The first technique of building a list of verb valencies is the “manual” techni- 
que, when a researcher writes down valencies according to his or her linguistic 
knowledge or intuition. This technique, even if it may look archaic and ineffi- 
cient way in computer processing, seems to be a needful one. Until complete 
and errorless tools for automatic processing of valencies are developed, the 
“manual” technique is convenient for making corrections and additions to 
the list or for building the core of the list. 

2. The next technique, that is good to begin with when creating a valency list, 
consists in taking up a list of valencies that can be found in the form of a 
dictionary (see P, □) after converting it into the electronic form. Although 
this technique is a good starting point, some typical difficulties arise during 
its realization, like a lack of the electronic version of the printed dictionary 
or inconsistent and out-of-date contents of such “manually” created list. 

3. The third technique is based on exploring a language via its representative — 
text corpus (see jSE!)- If the corpus is large enough and satisfactorily exem- 
plifying the language (which are the assumptions of a well built corpus), then 
this corpus technique is the most accurate one of all the stated techniques 
of building a valency list. It is highly probable that we can find all (used) 
variants of a given verb in corpus, and it is certain that all valency patterns 
which are obtained from corpus, are up-to-date, they are being used. An im- 
portant feature of this technique is the possibility to obtain complete results, 
that do not contain processing errors, in a rather short time (when compared 
to the “manual” techniques). An initial disadvantage of the corpus technique 
is the need of tools working with raw natural language texts and capable of 
getting the verb valency patterns out of the text only with knowledge of 
grammatical attributes of the words that can be found in a tagged corpus. 
If we do not have tools for syntactic analysis or its output available, then 
the necessary tools must be relatively sophisticated programs, especially in 
case of variform Slavonic languages (Czech). 

4 The Technique of Partial Syntactic Analysis 

The partial syntactic analysis is conducted by the GC system. This system 
works with an LALR(l) grammar that allows the shift-reduce conflict to appear 
in any state. Such conflict is solved by successive processing of both branches of 
analysis. 

The input to GC is essentially context-free grammar in machine-readable 
Backus-Naur Form (BNF) [II ( )) . The description of contextual actions connected 
to each rule of the grammar contains higher grammatical functions that perform 
additional tests. The grammar is entered in this form: 

noun-with-proper-names-group -> NOUN 
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propagate_all ($1) 

noun-with-proper-names-group -> proper-name-group 
propagate_all ($1) 

noun-with-proper-names-group -> NOUN proper-name-group 
agree_case_number_gender_and_propagate ($1 , $2) 

The GC system reads an input sequence of tokens (words tagged with a 
morphological analyser) and processes it according to the grammatical rules. If 
the input is correct, the system outputs a derivative tree of the given natural 
language sentence. 

As we mentioned above some pre-defined grammatical tests and procedures 
can be used in the description of context actions associated with each gramma- 
tical rule of the system. We use the following tests: 

— grammatical case test for particular words and noun groups 

noun-genitive-group -> noun-group noun-group 
test_genitive ($2) 
propagate_all ($1) 

— agreement test of case in prepositional construction 

prepositional-group -> PREPOSITION noun-group 
agree_case_and_propagate ($1 , $2) 
add_pr ep_ngr oup ( $ 1 ) 

— agreement test of number and gender for relative pronouns 

noun-group-with-rel-pron -> noun-group D,D rel-pron-group 
agree_number_gender_and_propagate ($1 , $3) 

— agreement test of case, number and gender for noun groups 

ad j -noun-group -> adj -group noun-group 

agree_case_number_gender_and_propagate ($1 , $2) 

— test of agreement between subject and predicate 

— test of the verb valencies 

clause -> subj-part verb-part 
agree_subj_pred($l ,$2) 
test_valency_of ($2) 

The contextual actions propagate_all and *_and_propagate propagate all 
relevant grammatical information from the nonterminals on the right hand side 
to the one on the left side of the rule. 

During the analysis the GC system builds a list of noun groups and adver- 
bial groups (procedures addjngroup, add_prep_ngroup and add_adverb_group) 
and a list of verb forms (add_verb) . The relevant grammatical features of noun 
and adverbial groups are extracted and translated into valency patterns of fo- 
und verbs. Eventually the valencies may be confronted with valencies from the 
existing list HH. 
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5 Assigning TIL Type According to Valencies Found 

We use the valency list obtained by means of the GC system when we want to 
find the logical construction that corresponds to the verb meaning. 

Having the valency list we want to find a distribution of all verbs into clas- 
ses of equivalence. As equivalent we regard those verbs whose valency lists are 
similar. The algorithm of finding the similar valency lists for verbs first modifies 
the original valency list. The modifications are as follows: 

1. In the valency list the valency expressions that are formed by a noun group 
with preposition (hPr{} or hTr{}) are (where it is possible) replaced by one 
of the expression hL (location), h.F (direction from), hA (direction to), hD 
(way description) or hW (time). 

This mechanism is very important since we work with “raw” data from 
syntactic analysis as described in the previous paragraph. Thus the infor- 
mation about location, direction or time is often expressed in the form of a 
noun group with preposition which has to be translated into the correspon- 
ding valency. 

2. The valency expressions of location and time are deleted from the valency 
patterns. The reason for this is that these expressions often represent adjun- 
cts that display circumstantial meaning. 

3. The valency lists for verbs modified in the previous steps are then sorted 
and duplicate valency expressions are left out. Resulting valency lists are 
compared eventually. 

In such a way it is possible to define a decomposition of the set of verbs into 
classes of equivalence. The verbs in each class then share the same type of logical 
construction. 

The Transparent Intensional Logic works with a hierarchy of types with the 
following four basic types: l (individuals), o (truth values), t (real numbers 
or time moments) and w (possible worlds). Other types are then created as 
functions from one type to another one or as types of higher rank, that can run 
over constructions. Some important types are Lrui (individual role), {oi)n^ (a 
class of individuals or a property) or (oaP)Tuj (an intensional relation between 
objects of types a and (3). 

If we want to translate a sentence into a construction, we first need to know 
the type of constructions that correspond to particular words in the sentence. 
Among them the construction representing a verb usually forms the basic part of 
the resulting construction and constructions of other words form its arguments. 
To determine the type of the verb construction seems to be more difficult than 
it is perhaps with a noun. 

The classification of verbs described above divides verbs into groups with 
the same type of construction. Moreover, it is possible to formulate rules for 
deducing the type directly from the valency list for a verb. We derive the type 
from the valency list of a verb class in the following way — first we construct a 
set of all valency expressions that appear in the valency list for a verb, so called 
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multi- valency. The multi-valency is a schema of all possible expressions that can 
be tied with the verb, the verb “arguments” . It also shows the number and kind 
of each argument. We assume that the verb expresses a relation between (at 
most) these arguments. In the sentence where some of these expressions are not 
present, the corresponding arguments are filled with null values. This approach 
allows to fill in a value of an argument that is missing in the sentence but is 
known from the preceding text and thus it semantically belongs to the verb. 
The expressions are translated to verb arguments in the following ways: 

1. hQ (property) is regarded as a property of individuals, (oi)™-objects. 

2. hM (amount) expresses a number of some individuals, it is an extensional 
(not dependent on the actual world or time) relation between a number and 
an individual or individuals, a (oT6)-object (logical object of type {otl)). 

3. hP (person) and hT (thing) can express an individual role or a class of indivi- 
duals, thus it has type Lru or {ob)rui- Only during the analysis of a particular 
sentence it can be determined which one of these types should be used and 
in some cases it cannot be determined at all since the respective expression 
can be ambiguous. 

4. hA (where to), hF (where from), hD (which way) and hR (reflexive pronoun) 
usually serve as modiflcators of the verb meaning. Therefore they do not 
change the type of the verb construction, they are functions that show the 
logical object expressing the modified meaning of a verb. 

5. all sX expressions refer to another construction, thus they are of a higher 
rank type 

For example, if we process the valency list of the verb mi't (have) with the 
algorithm, we obtain a multi- valency hA-hF-hPTc4-hPTc4r{za}-hPTc7r{s}-sI, 
which yields the following constructioiO 

Xw / uj .Xt / T.Xkdo / 1 .Xkoho-co/ 1 .Xza-koho-co/ 1 .Xs -kyni-cim / 1 .Xinf / . 

[°fcam/((o IIII){o IIII)ruj)wt 
[°odkud/{{o *n IIII){o *n HII)Tu)wt 
*n IIIItuj) wt\\j 

where I = bru or {ob)rui- 

The construction can be schematically written as 
modifier _where_to (modif ier_where_f rom( 
have ( 

sbjnomin, sb_st_accus , as sb_st_accus ,with sb_st_instr , inf 

) 

)) 

The constructions obtained by means of verb valencies represent the way how 
to extract the attributes of the verb meaning from the syntactic structure of the 
sentence. 

^ The object and variable names in the construction translated to English: 
Xw.Xt.Xsb_nomin.Xsb_st_accus.Xas_sb_st_accus.Xwith_sb_st_instr.Xinf. 
f^wherejtOnit ^ whereof rom^t ^havcwt]] 
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6 Conclusions 

The most important results lie in the implementation of the algorithm of par- 
tial syntactic analysis of Czech language that can automatically discover verb 
valencies in corpus data. We have also introduced an algorithm for determining 
the type of TIL construction associated with the verb meaning according to the 
list of its valency patterns. This procedure plays a key role in the system of TIL 
semantic analysis. 
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Abstract. In this paper a part of the system for recognising off-line cur- 
sive Czech text is presented. Recently, various systems for recognition of 
cursive English text has been developed, however, to our knowledge no 
method has been presented yet for Czech, a language rich in diacritic 
marks. This paper deals with preprocessing which is different for Czech 
and English handwritten texts. For finding the letter boundaries a me- 
thod based on minimising a heuristic cost function has been used. 

1 Introduction 

Handwritten form of a language is used in notebooks, personal letters, on enve- 
lopes, cheques, etc. Taking into account the possible importance of these docu- 
ments the benefits of automatic recognition of handwritten texts are obvious. 

The problem of handwritten character recognition can be subdivided into 
two categories: off-line recognition P2ini and on-line recognition m- On-line 
recognition deals with real-time data processing and has the ability to integrate 
pen-movement and pressure information. Off-line recognition, however, is ba- 
sed on a static input of the data and relies only on pixel information for the 
recognition of each word 0. 

Off-line cursive script recognition has progressed in the past thirty years from 
a novelty to a technology that can be implemented into commercial applications. 
Processing the characters with diacritic marks that are common in Czech, ho- 
wever, still represents a problem and has not been satisfactory solved yet. This 
paper deals with the preprocessing part of cursive script recognition in which 
specific features of a language rich in diacritic marks play the key role. 

2 Finding Text Line Boundaries 

Before starting let us define a useful term we will use in the following text: 
Smoothed pixel density histogram s is a histogram defined as follows: 




( 1 ) 



where h{i) is a pixel density histogram at the point i. 
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Splitting the page to rows is the first step which needs specific handling in 
Czech. The typical profile of a horizontal smoothed pixel density histogram for an 
English text is shown in the Fig.^. In the case of a Czech text this typical form 
is disturbed due to the acute accents and inverted circumflexes that influence 
especially the characteristic form in the ascender part (see Fig. ^>). Algorithms 
looking for the boundaries of the text rows could not be therefore based on 
searching characteristic patterns in the horizontal histogram because the style 
of the upper part of the histogram differs in Czech texts not only according to the 
style of writing (the slant of acute accents and position of writing of the inverted 
circumflexes) but also to the contents of the text (the number of diacritic marks 
above the characters on the row). 




(a) 



(b) 



Fig. 1. Histograms of handwriting: a) in English, b) in Czech 



The second problem we deal with in this part is the position of acute accents 
and inverted circumflexes above the text. Acute accents and inverted circumfle- 
xes may be too high above the text and a simple algorithm for finding the text 
rows could consider them an independent row. We have solved this problem in 
this way: We suppose that the rows have approximately the same height on the 
whole page. After an initial estimation of text lines the algorithm adjoins too low 
rows to the following row. Based on our experiments it is reasonable to assume 
that true rows are only those which are higher than a half of an average row. 

We have tried to find the borders of a line as precisely as possible. The rows of 
a written text in Czech usually have not straight line boundaries. The boundaries 
are often overlapped due to acute accents and inverted circumflexes and due to 
characters descending beneath the lower boundary. For the first approximation of 
boundaries we should calculate the horizontal smoothed histogram and estimate 
the baselines for all rows. Then the contour following algorithm may be used for 
the parts of characters which overlap the low borderline in order to find precise 
row boundaries. Possible problems with some thin and slanted strokes can be 
avoided by using the 3x3 Gaussian mask for the scanned picture of handwriting 
before the algorithm is applied. 
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3 Splitting Rows To Words 

The first phase of row processing is based on finding the reference lines. The 
reference lines of a text line are the four horizontal lines that mark the top of 
the ascenders, the top of the main bodies of letters, the baseline and the bottom 
of the descenders |7]. This phase should not be biased by Czech diacritic marks. 
The only exception may be the situation when the acute accents and inverted 
circumflexes are positioned too low above the text. In this case it may be a 
problem to find the top reference line. 

Let us suppose that the slant of a writing on a particular row is known and we 
are trying to use it for splitting the row to words which is a logical continuation 
of the whole process. The problems which arise are similar to that with splitting 
a page to rows so that we can use an analogous way to solve them. First we 
calculate the vertical histogram at the same angle as that of the slant of written 
text and smooth it again. Then the spaces between the words may be estimated 
using the values of calculated histogram (text density). This approach works 
well if a good method for the estimation is employed. Corrections of erroneously 
split rows are possible after the words are recognised by comparing them with a 
dictionary. 

Serious problems could arise by an improper positioning of acute accents and 
inverted circumflexes at the end of a word. If they are written too far behind 
the text the algorithm could separate them as an individual word. To avoid this 
situation a horizontal histogram for every short word is computed. The process 
reveals whether the word found in this way is located on the baseline of the text. 
A diacritic mark found in this way is joined to the precedent word. 

4 Extracting Style Parameters of Czech Text 

In the previous section we supposed that the slant of writing is known. The 
way to obtain its value along with other important characteristics of writing is 
described in following text. 

Histograms are not sufficient for the phase of splitting the words. The inner 
structure of particular words is more complex than the structure of the whole 
page or a row. Therefore, it is necessary to calculate the following parameters 
that characterise the word to be split: 

— dominant slant of writing 

— thickness of the pen 

— average width of characters 

— average height of characters 

It is obvious that the result of splitting the words largely depends on the 
accuracy of determination of particular parameters. To minimise the inaccuracy 
it is necessary to work not only with average values of these parameters but also 
with their deviations and to take them into consideration when the word is split. 
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The basic parameter used in handling with words is the slant of writing. 
All the methods mentioned in the following text are based on it. Let /i be a 
histogram. Define d{h) as a sum of differences of all pairs of adjacent values in 
histogram h. To determine the slant of writing the vertical histograms at angles 
of —20, —10, 0, 10, 20, 30 degrees are created and the value of d{h) is calculated 
for all of them. The angle with a minimum value is to be considered the dominant 
slant. 

The vertical histogram for the dominant slant of writing can be used to find 
the thickness of the pen. As the average thickness of the pen the average of the 
set containing the smallest non-zero elements of the histogram in a given range 
is employed. 

To determine the average width of characters we calculate the vertical histo- 
gram at the dominant slant angle. Then we search for places where the histogram 
value approximately equals to the determined width of the pen and the histo- 
gram values are rising on the right side. We calculate the average of the distances 
between these points as well as their deviations. The determination of the aver- 
age width of characters already in this phase is rather difficult and the result is 
not fully reliable. Moreover, the characters like n, u, m, w, etc. can be misread 
as pairs or triples of letters. This problem can be eliminated in the phase of 
recognition and postprocessing only. 

In the determination of the average height of characters we can considerably 
simplify the task by using the height of the highest character as the searched 
value. 

Vertical histograms are influenced by the diacritic marks much more than 
the horizontal ones. The determination of the slant of writing is influenced by 
diacritic marks because the slant of the acute accents and the style of writing 
of inverted circumflexes considerably affect the histogram computed at different 
angles. The style of writing diacritic marks may differ from writing style of true 
letters and there is a risk that the slant of writing the marks could override 
the true slant of writing in case of short rows with a relative large number of 
letters with diacritic marks. The determination of the average width of characters 
could be also influenced by an improper position of acute accents. The influence 
of diacritic marks can be eliminated by a proper heuristics. Therefore, we use 
the values based on the histogram of the whole rows in which the deviation are 
eliminated by averaging. 



5 Finding Letter Bonndaries 

The process of splitting words can begin after all style parameters are determi- 
ned. The influence of diacritic marks is critical in this part of processing. In this 
stage we work only with particular words. Therefore, the deviations in vertical 
histograms are not eliminated by averaging so that they can lead to misinter- 
pretation of data. The slant of acute accents manifests itself by strong jumps in 
the histogram which prevent the algorithm from finding the position of letters. 
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Simply, the algorithm designed for splitting letters in English texts does not 
work for Czech texts. 

The solution of this problem can be temporal elimination of diacritic marks. 
It is performed by using the algorithm for finding single graphical objects in the 
upper part of the row. Then the original algorithm for splitting words into par- 
ticular characters can be applied. However, problems may arise where diacritic 
marks are adjoined back to the letters because the acute accents and inverted 
circumflexes are not written at the constant place with respect to the position 
of the letter they belong to. Therefore, we decided not to attach the diacritics 
back to the text in this phase but to process it separately. A Czech text in which 
diacritic marks are absent is processed in the same way as an English text (in- 
cluding the recognition of particular letters). Diacritic marks are then adjoined 
to the final text. Re-attachement of diacritic marks based on a language model 
only to the letters where it is possible considerably improves usefulness of the 
algorithm. 

The algorithm for splitting words is described in the following paragraphs [ 7 ] : 
In the first phase the procedure is similar to the determination of the width 
of characters. Based on the prevalent slant of writing we choose the set of four 
angles in its neighbourhood and apply the algorithm for the determination of the 
width of characters to the histograms at corresponding slants with the following 
changes: When a proper place for division is found we add its position to the 
set M which was initialised to be empty. This process is repeated for all chosen 
angles. The set M can be, therefore, considered the set of candidates for division 
of a given word and our task is to use the most appropriate one. It is useful to 
order the set M after all candidates are added. 

The first point in the ordered set M is chosen as the initial new point of 
division. Then we create the set N of points which are from the initial point 
distant of less than a half of the average width of character. This points we 
remove from the set M . In the next step we remove the point with the smallest 
value of the cost function (defined later) from the set N and insert it to the set 
Q of the final division points. This process is repeated until the set M is empty. 
Finally we unify the points in the set Q the inter-distance of which is less than 
a quarter of the width of characters. It is probable that such points correspond 
to the same boundary of a character. 

The detailed algorithm can be described by the following steps: 

1. Find the division point p 

2. Let the threshold point t be at a distance of a half of the width of character 
from p. 

3. Let the set N be the set of division points between p and t 

4. If there are more division points with the same angle in the set N, delete all 
the points after the second one. 

5. Chose the division point q with the smallest value of the cost function from 
the set N and insert it to the set Q 

6. Unify the points in the distance less than one quarter of a character in the 
set Q 
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Finally we need to define the above mentioned cost function: 

2 



cost {a,p) = wi 



P-PP 



ew 



— W2 



P-PP 



ew 



+ W3 (tc) + W4 (he) 



(2) 



The function is defined for the pair (a,p) where a is the angle of the division line 
and p is the position of the division point on the baseline, pp is the position of the 
previous division point on the baseline, ew is the average width of characters, 
tc is the number of pixels intersected by the division line normalised by the 
average width of the pen, and he is the height of the highest point intersected 
by the division line normalised by the height of the row. Experimentally found 
constants wi,W2,ws,W4 control the correct division of letter pairs and triplets. 
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Abstract. We present a method for automated theorem proving in a 
combination of theories with disjoint signatures. The Nelson-Oppen com- 
bination technique for decision procedures is used to combine separate 
theorem provers in different theories. The provers being combined are 
based on the Prolog Technology Theorem Proving method and they use 
the SLD resolution (alternatively Model Elimination) as an inference sy- 
stem. Our approach enables to tune up the provers for different theories 
separately and increases the efficiency of automated theorem proving in 
a combination of theories. 



1 Introduction 

The specifications of hardware and software systems often involve huge sets of 
axioms and that is the reason, why such systems are often described in a modular 
fashion - using structured specifications mm- The theorem proving is needed 
to support the process of the development of such systems from their formal 
specifications. The modularity of the structured specifications may be used in 
the process of theorem proving, as it enables to prove something in a context of 
a given subspecification without considering the whole structured specification. 
Although such a process typically requires user interaction, some parts of it can 
be automated. 

Since first order automated theorem proving can be used to support this 
process of systems development EEnEni, it is desirable to study the techniques 
of automated theorem proving in structured specifications, so that the automated 
theorem prover can benefit from the modularity of the theory. 

This paper deals with the automation of the process of proving in a com- 
bination of theories, keeping the proving activities in the combined theories 
separated. We show that this approach can improve the efficiency of automated 
theorem proving in the combination of theories in comparison with the method 
of flattening of the theory, when the axioms of the combined theories are treated 
in one set. 

The paper is organized as follows: Section 2 describes the Nelson-Oppen com- 
bination technique for decision procedures. In section 3, we present an algorithm 
that combines this technique with the SLD resolution. The extension of the al- 
gorithm for Non-Horn theories is shortly discussed in section 4. In section 5 we 
present some experimental results and the last section concludes. 
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2 The Nelson-Oppen Combination Technique 



In this section, we will describe the Nelson-Oppen combination technique for 
decision procedures 15112121)1 . The method integrates decision procedures for 
quantifier free decidable theories (denoted Ti) with disjoint signatures {sig(Ti) n 
sig{Tj) is empty for i ^ j) into a decision procedure for their union. 

We say that t is an i-term if it is a variable or it has a form /(s) and 
/ G sig{Ti). A subterm s of t is an alien suhterm if t is an i-term and s is maximal 
subterm of t, such that s is a j-term for j ^ i. If the i-term t contains only 
symbols from sig{Ti) and variables, we call it a pure (i-)term. These definitions 
can be straightforwardly extended to atomic formulas (we can separate them 
into theories using the predicate symbol instead of the top function symbol of a 
term). We will refer to the theories Ti as basic theories. 

To decide a formula T’ in a combination of theories T = IJi with the above 
stated properties: 



1. Convert the formula ~^F into a disjunctive normal form. It is sufficient to 
show that any conjunction is unsatisfiable to determine the validity of F 

2. In the conjunction homogenize each literal (based on its predicate / top 
term for equality) so that alien subterms are replaced by variables and ap- 
propriate equality is added to this conjunct. Exactly, for a literal L with 
predicate belonging to the theory Ti (or a top term function symbol belon- 
ging to Ti in case the literal is an equality) find in each argument the (largest) 
subterm t with the top function symbol / such that / does not belong to 
signature of Ti . This subterm t is replaced by a new variable x (we will call 
these newly introduced variables shared variables) and an equality x = t is 
added to the conjunction. This process is applied recursively and to all the 
literals, so that after the homogenization each literal contains just predicate 
and function symbols from one theory (we will say that the literal belongs 
to Ti). (See example on Fig.l) 

3. Split the resulting conjunction to conjunctions <l>i such that just all the 
literals from $i belong to Ti . 

4. If ^i is unsatisfiable in Ti for some i, then is unsatisfiable in T 

5. If a disjunction of equalities for shared variables X\ = yi \J ... \J Xk = Pk 
can be deduced from <Pi in Ti (This can be decided using decision procedure 
for <!>i /\ x\ ^ yi /\ ... f\ Xk ^ yu ■ Further, there are finitely many shared 
variables and therefore finitely many possible disjunction of equalities, i.e. the 
problem of finding all the disjunctions is decidable) propagate the equalities 
into other theories. Exactly, for any disjunct of equalities Xj = yj add Xj = pj 
to <l>k for any k and apply recursively this procedure (from point 4) . If for all 
disjuncts the result is unsatisfiable, then is unsatisfiable in T. If there are no 
disjunctions that can be deduced from any (!>i in Ti (and the point 4. did not 
determine unsatisfiability) or for some deduced equality Xj = yj,<h>k/\Xj = yj 
is satisfiable for all k, then ^ is satisfiable in T. 



In the example (Fig.l), we can see the case, when the propagation of a 
disjunction of equalities between theories is necessary to show unsatisfiability. 
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Ti = {Vs, y {q{x, y) A{x = y)^ p{x))} 

Ta = {Vs(/(/(s)) = f{x) V f if (fix))) = fix))} 

T = Ti U T 2 

We want to decide if : T h g(/(s), /(/(/(s)))) A g(/(s), /(/(s)))) =i> p(/(s)) 

1. negation: q(/(s), /(/(/(s)))) A <7(/(s), /(/(*)))) A ^p{f{x)) 

2. homogenization: q{u,v) Au = f(x) Av = /(/(/(s))) 

A g(w', «') A m' = /(s) A u' = fifix))) A ^p[w) Aw = f(x) 

3. = g(u, n) A q{u' , v') A ^p{w), 

^2 = (u = fix) Av = fififix))) Au' = fix) Av' = fifix))) Aw = fix)) 

4. Ta , ^2 h w = u 

5. Ta , ^2 h w = u' 

6. T 2 ,(p 2 ^ u = V \/ u' = v' 

7. = u,w = u' ,u = V is unsatisfiable 

8. = u,w = u' , u' = v' is also unsatisfiable and therefore 

T, qifix), fififix)))) A qifix), fifix)))) A ^p(/(s)) is unsatisfiable i.e. 

T b qifix), fififix)))) A qifix), fifix)))) =» p(/(s)) 

Fig. 1. Example of the use of Nelson-Oppen combination method 



The disjunction of equalities causes branching (called split in [T2|) in the above 
described algorithm and therefore has an impact on its efficiency. It is often 
the case, that just single equalities (and not the disjunctions with two or more 
members) can be deduced from the theories. These theories are called convex 
theories. 

The algorithm given above is complete with further assumptions on the theo- 
ries 023: The theories Ti have to be stably infinite, which means, that every 
formula F is satisfiable in Ti iff it is satisfiable in some infinite model of T. 

3 Combination of Nelson-Oppen Procedure with SLD 
Resolution 

The combination techniques for decision procedures over union of theories can 
be rather straightforwardly extended for theorem proving in a combination of 
theories. We will require, that the combined theories have disjoint signature 
(the result is extendable to the case when they can share constant symbols 
using results from 0). We have to overcome several difficulties to use theorem 
proving procedures instead of decision procedures: 

1. As the theories considered may be undecidable, one has to cope with the 
fact, that the theorem proving procedures need not terminate in contrast 
with the decision procedures. 

2. Proofs have to be assembled into a proof of the formula being proved. Ano- 
ther question arises in the context of a concrete deterministic algorithm: 
How to determine, which equalities among shared variables shall we try to 
solve (and in what theory) ? 
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We will concentrate on Horn theories, where SLD resolution (a goal driven 
strategy used in logic programming, also called backward chaining) is complete 
For Horn theories, there can be no splits in Nelson-Oppen algorithm, i.e. 
Horn theories are convex. The reason is that for a Horn theory, we have the least 
Herbrand model Mh of T ^ and therefore a proper disjunction of equalities (i.e. 
such that only one equality is valid in some models and only the second one in 
other models) cannot be a consequence of T. 

Our algorithm uses iterative deepening search, i.e. the depth of a proof is 
restricted when searching for the proof. In case the proof cannot be found with 
a given depth bound, the depth bound is increased for the further attempt. 
That is the method of overcoming the potential undecidability of the theory. 
The propagation of equalities among underlying provers for different theories is 
ensured by a pool, where the proven equalities are stored, and which is accessible 
from all theories. As the underlying prover, we use a procedure realizing SLD 
resolution with bounded depth of search and with unit-lemma caching IM. 
We use some PROLOG notation in the following text, in particular we denote 
variables by uppercase letters and we speak about facts (unit clauses) and rules 
(non-unit clauses) of a theory. 

Since the theorem prover based on SLD resolution is goal-driven, it is natural 
to generate equalities between shared variables, that would help to solve the goal 
and to store them in a second pool, where they can form a goal for proving in 
another theories. In these pools, the goals are stored along with their proofs. We 
will call the pools Pool-done (contains proved literals) and Pool-todo (contains 
equalities inferred by backward chaining as subgoals). 

In the process of homogenization, we use terms of a form v{x), v{y) etc., 
instead of variables (here x and y mean constants). The reason is, we have to di- 
stinguish shared variables from the original variables. Our algorithm solve-NOP 
extends the underlying prover for unstructured theories: 

The procedure solve.NOP (Thy ,Goal)'. 

1. Do with a given depth limit for all proofs: 

2. If a (homogenized) literal shall be proved in a combination of basic theo- 
ries (Thy), select the theory where it belongs (using its signature) and try 
to prove it using the underlying prover (with proper facts and rules). The 
literals from Pool-done are accessible as facts in all basic theories. 

3. If the literal of the form (i.e. unifiable with) eq(v(X),v(Y)), that means equa- 
lity between shared variables, is the original goal or it is encountered as a 
subgoal, then 

a) If the literal eq{v{x),v{y)) is on the Pool-done, then the subgoal is solved 

b) If the literal eq{v{x),v{y)) is not on the Pool-todo, then insert it on this 
Pool and call solve -iter -pool 

c) If the literal eq{v{x), v{y)) is on the Pool-todo and not on the Pool-done, 
then call solve-iter-pool 

4. If the original literal was not solved (neither step 2 succeeded, nor it is on 
the Pool-done), increase the depth bound (add a positive number to current 
depth bound) and go to 1. 
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Procedure solve Jter-pool goes over all goals on the Pool-todo and over all com- 
bined theories and tries to solve the goals with a given depth limit until both 
Pool-done and Pool-todo become stable. 

The procedure solve Jter-pool: 

1. repeat 

2. for all Goals G Pool-todo do 

3. for all Theories € Combined Theory do 

4. solve-N OP (Theory, Goal) but dont call solve -iter -pool recursively (!) 

5. if Goal was solved then move it from Pool-todo to Pool-done (along with its 
proof) 

6. od 

7. od 

8. until there is neither a change on Pool-todo nor any new literal on Pool-done 

The underlying prover with the depth restriction will always terminate and 
it is correct and complete for Horn theories. The procedure solve-NOP without 
call to solve -iter -pool will therefore also halt. Then the solve -iter -pool procedure 
must be terminating as well, as there are just finitely many shared variables and 
therefore finitely many states of pools. From that we can immediately conclude 
that solve-NOP (including recursive calls to solve-iter-pool) is terminating for 
a given depth limit (i.e. with the depth limit bounded in the step 4, steps 2 
and 3 will always terminate) . For the proof of completeness, we assume that we 
combine two theories Ti and T 2 (the proof for more theories being an obvious 
generalization) . The notation Ti U T 2 > A will in the following mean that the 
algorithm will find the proof of a positive literal A in a theory which is the 
combination of basic theories with axioms Ti and T 2 . Ti U T 2 A means, that 
A is provable from the union of axioms T\ U T 2 using SLD resolution. 

Theorem 1. The above stated algorithm is sound and complete for theorem 
proving in a union of Horn theories with disjoint signatures i.e. Ti U T 2 t> A if 
and only if TiU T 2 A We assume that the two theories T\ and T 2 fulfill the 
requirements of the algorithm solve JMOPj i.e. they have disjoint signatures, 
they are stably infinite and their union is consistent. 

Informally: We will denote Eq the contents of Pool-done. From the completeness 
of backward chaining and from axioms for equality, when it is possible to prove 
TiO Eq\- A, we will get Eq as & subgoal using SLD resolution (as it will be in 
the body of some clause from Ti or some clause describing equality axioms). The 
subgoals Eqk G Eq will get to the Pool-todo and will trigger the solve -iter -pool 
procedure (even in the case, the goal is already on the Pool). The solve -iter -pool 
procedure will solve Eq^ if it is possible in any Tj wrt. a given depth limit. While 
new equations are encountered as subgoals during solvc-iter-pool, the procedure 
is repeated and therefore all possible equations among shared variables, dedu- 
cible from Tj within the given depth limit that can help to establish T \- A 
(using rules and facts from Tj) are proved and stored on Pool-done after the 
run of solvc-iter-pool. As all the equalities stored on Pool-done are accessible to 
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the provers in all theories, completeness wrt. a given depth limit follows from 
completeness of Nelson-Oppen technique and of underlying prover. 

Formally: We want to prove: If Ti U T 2 h ^ then Ti U T 2 [> A. As Ti U T 2 h A then 
from completeness of SLD resolution there exists a SLD-refutation of from 
Ti U T 2 . We will represent this refutation in a form of a proof tree with the root 
A, in which every branching corresponds to an application of a rule (with proper 
substitution propagated up through the tree, cf. 1 1 I j 1 . We will consider the 
whole tree with the resulting answer substitution S applied to its nodes. The 
leafs of the tree correspond to the successful unifications with the facts (see Fig. 
2). We will also assume that we run our algorithm with a depth limit big enough 
to construct the tree (that will always happen as the depth limit is iteratively 
increased until the goal is proved). 



A5 e sig(T 1 ) 




C5 e sigCTj) C’6Gsig(Ti) (X’” = Y’”)6 

Fig. 2. A proof tree for goal A 



Our proof is based on the fact, that the only nodes that can switch between 
theories are the nodes containing an equality between shared variables. More 
precisely, if there are two nodes on a path from root that contain literals expres- 
sed in different signatures sig(Ti) and sig(Tj) for i ^ j, then there has to be a 
node containing an equality between shared variables between these two nodes. 
The reason is that the signatures of the theories are disjoint (but for the pred. 
symbol of equality) . We will prove the theorem using induction on the height of 
highest subtree of the proof tree that contains equality between shared variables 
in its root. 

1. If there is no equality between shared variables in the tree, then all the 
literals in the tree belong syntactically to the same theory Ti, given by A. 
Hence h A and Tit> A i.e. our algorithm proves A (as the SLD resolution 
in a single theory is built in it - see solve^NOP, step 2). 

2. If there is an equality between shared variables in the leaf of the tree, this 
equality has to be a fact in some Ti. In the SLD resolution process, this 
equality will be established as a subgoal and thus it will get to Pool-todo (in 
step 3 of our algorithm). After the call to solve Jterjpool it will get to Pool- 
done and algorithm succeeds for that subgoal (step 4). Hence, if the tree 
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contains equalities between shared variables just in the leafs, the algorithm 
will succeed for goal A by the same arguments as in the previous point, 
augmented by the fact that equalities in the leafs will be proved. 

3. If there is a highest subtree of height k with an equality between shared 
variables {X = Y) in the root of the proof tree, we will prove that the 
algorithm succeeds by induction on k. We got the result for fc = 1 in point 
2. As we have a complete inference mechanism (underlying prover - SLD 
resolution) built in our algorithm, we get some instantiation of the root 
equality literal as a subgoal (X = Y)a. This subgoal gets to Pool-todo in 
step 3 of our algorithm and solveAter-pool is called that tries to solve this 
subgoal in both theories using a recursive call of the algorithm. The algorithm 
will succeed on the subgoal and an instantiation {X = Y)5 will get to Pool- 
done by induction assumption (there can be just subtrees with equality in 
the root of the height less than k in the proof tree of the goal) . 

When the subgoal {X = Y)S gets to the Pool-done the algorithm succeeds 
on A (point 1 of proof). In point 3 of our proof, it is necessary to consider 
the fact that solveAterjpool is not called recursively, which seems to damage 
induction assumption. Nevertheless, the recursive call to solveAterjpool is repla- 
ced by iteration of solve-iterjpool until Pool-todo becomes stable. Correctness 
of the algorithm solve.NOP follows directly from the fact that every derivation 
computed by solve.NOP is also a SLD derivation in Ti U T 2 . 



4 Non Horn Theories 

The algorithm solve^NOP can be extended to cope with non-Horn theories. 
It can be combined with the method of Model Elimination (resp. its 

variant Restart Model Elimination |2j) in order to become complete for the 
combination of any first order quantifier free theories with disjoint signatures. 
The basic idea of this extension is to store equalities on pool along with the set 
of their ancestors. The context of a goal, formed by its ancestors, can be used 
to perform reduction steps that extend the SLD inference mechanism to become 
complete. As we cannot describe this method more thoroughly here, due to the 
lack of space, we refer to m for details. 

5 Experimental Results 

We have undertaken several experiments with our prover, based on the ideas 
described in previous sections. The prover was implemented in Prolog, based 
on ideas of Prolog Technology Theorem Prover 1141181191 with caching |14l21j . 
We used examples from the domain of formal methods, namely we proved some 
statements in the specifications of data structures like lists or arrays combi- 
ned with theories describing linear arithmetics and equality for uninterpreted 
function and predicate symbols. The tasks 1,2 come from jS|, where they are 
solved using Nelson-Oppen technique and decision procedures, task 3 is based 
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on a structured specification from For exact axiomatization see m- As an 
example, we present task 2, which was to prove: 

{x<yAy<x + Head{[0, a;]) A P(/(x) - /(y))) ^ P(0) 

where P resp. / are uninterpreted function symbols and Head means a head of 
a list. 

We compared our (structured) approach with the case, when the structure 
was flattened, i.e. the axioms of all the theories being combined were grouped 
together into one resulting theory. The results showed that the theorem proving 
in structured specifications can benefit from keeping the structure even when 
the theorem provers for separate theories are not specifically tuned up (using 
different heuristics and/or term weights etc.). 

In the following tables, there are summarized three parameters for each task: 
The runtime, the number of inferences and the number of database hits. The in- 
ference is counted whenever some literal is inferred i.e. when solve{.., Subgoal , ..) 
returns some instance of Subgoal. The number of hits summarizes all accesses 
to database of input rules and facts and to cache. It is incremented whenever a 
rule (or a fact) or a subgoal instance is retrieved from the database or cache in 
an attempt to solve a subgoal. 



Table 1. Experimental results 



Task no. 


solve_NOP 


flattened 




time 


inferences 


hits 


time 


inferences 


hits 


1 


24 


1625 


7827 


20 


129 


3483 


2 


314 


7343 


57064 


1200* 


4701 


63557 


3 


35 


2237 


9293 


105 


1409 


14917 



* The computation was aborted as the time limit was achieved 



The results show (Table 1), that in all cases the number of inferences is less 
for the flattened form of input. The reason is, that when the Pool is used, the 
equalities are repeatedly tried to be solved in different theories. On the other 
hand, in the case of more difficult tasks (2,3) the algorithn solve.NOP is better 
both in time and in the number of hits. The reason for this behavior can be 
found when one examines the cached solutions. The number of cached solutions 
is summarized in Table 2. 

The number of cached solutions is significantly smaller in the case of 
solve-NOP and this leads to the better results of our algorithm comparing to 
the underlying prover on the flattened form of input. The reason for the smaller 
number of cached solutions is the syntactical restriction on terms (and literals) 
imposed by solve-NOP. Mixed terms and literals containing function symbols 
from different theories may not be formed and this leads to the restriction of the 
search space and to the restriction of the cache size too. 
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Table 2. Cache content 



Task no. 


Cache content 




solve-NOP 


flattened 


1 


20 


24 


2 


175 


444* 


3 


27 


141 



* The computation was aborted as the time limit was achieved 



As an example, we may consider an equality f{v{y)) = f{v{y) + 0) that 
is inferred in a flattened form of task 2 and that says that application of any 
function / on a shared variable v{y) gives the same result as its application on 
v{y) + 0, which is based on the fact that v{y) = v{y) + 0. The algorithm will also 
infer and store in cache many other similar equalities like f{v{y)) = f(v{y)+0+0) 
etc. On the other hand, none of these equalities can be inferred by the solve^NOP 
algorithm as they use symbols from more theories (0, + from the theory of linear 
arithmetics and / from the theory of uniterpreted function symbols) . 

The extra-inferred literals are of proper types and therefore it is not possible 
to achieve similar effect using type control. Our algorithm solve-NOP restricts 
the connection between these theories to shared variables and in that way it 
avoids repeatable work, as it enables to infer just ’’syntactically pure” literals. 

6 Conclusion 

We presented a method of automated theorem proving in a combination of theo- 
ries with disjoint signatures that keeps the proving activities in different theories 
separate. The method is based on a propagation of equalities between shared 
variables in the spirit of Nelson-Oppen combination technique for decision pro- 
cedures. Our method is useful from two points of view: It can be used to tune up 
the provers for different theories separately and even when the provers with the 
same setup are used in all basic theories, it increases the efficiency of theorem 
proving. 

In future, we would like to extend this technique for the use with many sorted 
first order logic. Further, our method could provide a basis for integration of 
theorem proving and decision procedures 0. We are also interested in automated 
theorem proving in general structured specifications i.e. including operations like 
renaming and parameterization m that are not used here. 
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